Author: Indranil Ghosh
Title: Football (soccer) data analysis: A pedagogic introduction
Institute: School of Fundamental Sciences, Massey University
Twitter: @indraghosh314
Website: https://indrag49.github.io/
Date: 12-09-2021
This talk teaches these simple concepts to those who want to start working on football data analysis:
How to get open access event data from statsbomb using statsbombpy
,
How to draw a soccer pitch using mplsoccer,
How to visualize a pass network for a particular team in a particular match,
How to use NetworkX module to analyze the pass network,
How to draw pass maps along with their corresponding heat maps, and
How to implement computational geometric concepts like Convex Hulls, Voronoi diagrams, and Delaunay triangulations using the Python package scipy.spatial on football event and tracking data
statsbombpy
¶pip
to install statsbombpy
by using the following command:pip install statsbombpy
The open data from Statsbomb can be accessed without any need of authentication from the user but it is always advised to go through the Terms & Conditions section stated at their documentation page.
statsbombpy
package.from statsbombpy import sb
numpy
and the pandas
packages that help us manipulate our datasets and perform analyses like data cleaning and data extraction.import numpy as np
import pandas as pd
comp = sb.competitions()
credentials were not supplied. open data access only
comp
look like this:comp.head(15)
competition_id | season_id | country_name | competition_name | competition_gender | season_name | match_updated | match_available | |
---|---|---|---|---|---|---|---|---|
0 | 16 | 4 | Europe | Champions League | male | 2018/2019 | 2021-04-19T17:36:05.724116 | 2021-04-19T17:36:05.724116 |
1 | 16 | 1 | Europe | Champions League | male | 2017/2018 | 2021-01-23T21:55:30.425330 | 2021-01-23T21:55:30.425330 |
2 | 16 | 2 | Europe | Champions League | male | 2016/2017 | 2020-08-26T12:33:15.869622 | 2020-07-29T05:00 |
3 | 16 | 27 | Europe | Champions League | male | 2015/2016 | 2020-08-26T12:33:15.869622 | 2020-07-29T05:00 |
4 | 16 | 26 | Europe | Champions League | male | 2014/2015 | 2020-08-26T12:33:15.869622 | 2020-07-29T05:00 |
5 | 16 | 25 | Europe | Champions League | male | 2013/2014 | 2020-08-26T12:33:15.869622 | 2020-07-29T05:00 |
6 | 16 | 24 | Europe | Champions League | male | 2012/2013 | 2020-08-26T12:33:15.869622 | 2020-07-29T05:00 |
7 | 16 | 23 | Europe | Champions League | male | 2011/2012 | 2020-08-26T12:33:15.869622 | 2020-07-29T05:00 |
8 | 16 | 22 | Europe | Champions League | male | 2010/2011 | 2020-07-29T05:00 | 2020-07-29T05:00 |
9 | 16 | 21 | Europe | Champions League | male | 2009/2010 | 2020-07-29T05:00 | 2020-07-29T05:00 |
10 | 16 | 41 | Europe | Champions League | male | 2008/2009 | 2020-08-30T10:18:39.435424 | 2020-08-30T10:18:39.435424 |
11 | 16 | 39 | Europe | Champions League | male | 2006/2007 | 2021-03-31T04:18:30.437060 | 2021-03-31T04:18:30.437060 |
12 | 16 | 37 | Europe | Champions League | male | 2004/2005 | 2021-04-01T06:18:57.459032 | 2021-04-01T06:18:57.459032 |
13 | 16 | 44 | Europe | Champions League | male | 2003/2004 | 2021-04-01T00:34:59.472485 | 2021-04-01T00:34:59.472485 |
14 | 16 | 76 | Europe | Champions League | male | 1999/2000 | 2020-07-29T05:00 | 2020-07-29T05:00 |
comp
to understand the dataset better and draw out relevant information from the same. Type the following:print(comp.columns)
Index(['competition_id', 'season_id', 'country_name', 'competition_name', 'competition_gender', 'season_name', 'match_updated', 'match_available'], dtype='object')
comp
dataset. For example, if we look into the row where the competition_id
is 16
and the season_id
is 1
, we notice that the country_name
is Europe
, the competition_name
is Champions League
, the season_name
is 2017/2018
, and so on. Suppose we are satisfied with the above information, and we want to analyze a game from 1017/18's Champions League season. We keep note of the competition_id
and season_id
at that row, which are 16
and 1
respectively. Now we extract out the matches dataset by typing the following:mat = sb.matches(competition_id = 16, season_id = 1)
credentials were not supplied. open data access only
mat
looks like this:mat
match_id | match_date | kick_off | competition | season | home_team | away_team | home_score | away_score | match_status | match_status_360 | last_updated | last_updated_360 | match_week | competition_stage | stadium | referee | data_version | shot_fidelity_version | xy_fidelity_version | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 18245 | 2018-05-26 | 20:45:00.000 | Europe - Champions League | 2017/2018 | Real Madrid | Liverpool | 3 | 1 | available | unscheduled | 2021-01-23T21:55:30.425330 | None | 7 | Final | NSK Olimpijs'kyj | M. Mažić | 1.1.0 | 2 | 2 |
mat
dataset gives us the match ids, the match dates, the kick off times, the home and away teams, the scores in a particular match, the name of the referee who officiated the match and so on. Here match_id
is the unique id that will help us draw out event data for a particular match from 2017/18's Champion's League season. Let us get the event data from a match. We see there is only one match available, with match_id = 18245
, which was the Champions League final match between Real Madrid and Liverpool ⚽ that took place at the Olimpiyskiy National Sports Complex, Moscow stadium and it ended up 3-1 in Real Madrid's favor 👀 👀 👀 👀. A great feat to be honest! Let us obtain the event data for this match.events = sb.events(match_id = 18245)
credentials were not supplied. open data access only
events
fetching us the event data for the particular match looks like this:events
50_50 | ball_receipt_outcome | ball_recovery_recovery_failure | block_offensive | carry_end_location | clearance_aerial_won | clearance_body_part | clearance_head | clearance_left_foot | clearance_right_foot | ... | shot_statsbomb_xg | shot_technique | shot_type | substitution_outcome | substitution_replacement | tactics | team | timestamp | type | under_pressure | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | {'formation': 41212, 'lineup': [{'player': {'i... | Real Madrid | 00:00:00.000 | Starting XI | NaN |
1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | {'formation': 433, 'lineup': [{'player': {'id'... | Liverpool | 00:00:00.000 | Starting XI | NaN |
2 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Real Madrid | 00:00:00.000 | Half Start | NaN |
3 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Liverpool | 00:00:00.000 | Half Start | NaN |
4 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Liverpool | 00:00:00.000 | Half Start | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
3492 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Real Madrid | 00:42:21.211 | Offside | NaN |
3493 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Real Madrid | 00:48:31.725 | Half End | NaN |
3494 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Liverpool | 00:48:31.725 | Half End | NaN |
3495 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Liverpool | 00:48:02.893 | Half End | NaN |
3496 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Real Madrid | 00:48:02.893 | Half End | NaN |
3497 rows × 86 columns
print(events.columns)
Index(['50_50', 'ball_receipt_outcome', 'ball_recovery_recovery_failure', 'block_offensive', 'carry_end_location', 'clearance_aerial_won', 'clearance_body_part', 'clearance_head', 'clearance_left_foot', 'clearance_right_foot', 'counterpress', 'dribble_nutmeg', 'dribble_outcome', 'dribble_overrun', 'duel_outcome', 'duel_type', 'duration', 'foul_committed_advantage', 'foul_committed_card', 'foul_committed_type', 'foul_won_advantage', 'foul_won_defensive', 'goalkeeper_body_part', 'goalkeeper_end_location', 'goalkeeper_outcome', 'goalkeeper_position', 'goalkeeper_punched_out', 'goalkeeper_technique', 'goalkeeper_type', 'id', 'index', 'injury_stoppage_in_chain', 'interception_outcome', 'location', 'match_id', 'minute', 'off_camera', 'out', 'pass_aerial_won', 'pass_angle', 'pass_assisted_shot_id', 'pass_body_part', 'pass_cross', 'pass_cut_back', 'pass_end_location', 'pass_goal_assist', 'pass_height', 'pass_inswinging', 'pass_length', 'pass_miscommunication', 'pass_outcome', 'pass_outswinging', 'pass_recipient', 'pass_shot_assist', 'pass_straight', 'pass_switch', 'pass_technique', 'pass_through_ball', 'pass_type', 'period', 'play_pattern', 'player', 'position', 'possession', 'possession_team', 'related_events', 'second', 'shot_aerial_won', 'shot_body_part', 'shot_end_location', 'shot_first_time', 'shot_freeze_frame', 'shot_key_pass_id', 'shot_one_on_one', 'shot_outcome', 'shot_redirect', 'shot_statsbomb_xg', 'shot_technique', 'shot_type', 'substitution_outcome', 'substitution_replacement', 'tactics', 'team', 'timestamp', 'type', 'under_pressure'], dtype='object')
mplsoccer
.If you do not want to recreate a football pitch manually using Python (which would be rather tedious) you can simply use the mplsoccer module without any concern. To my knowledge it provides with the best functionalities to draw a football pitch. This package is maintained by Anmol Durgapal and Andrew Rowlinson.
Keep in mind you can do a lot more advanced visualization stuffs using mplsoccer besides drawing a football pitch. We will encounter them as we move forward with other posts later. For now let us focus on visualizing a pitch in the simplest way possible. We need to pip
install the package first:
pip install mplsoccer
mplsoccer
uses Python 3.6+. Next we need to import matplotlib
and the Pitch
classes. import matplotlib.pyplot as plt
from mplsoccer.pitch import Pitch
pitch = Pitch(pitch_color = 'grass', line_color = 'white', stripe = True, constrained_layout = True,
tight_layout = False, goal_type = 'box', label = True, axis = True, tick = True)
fig, ax = pitch.draw()
plt.show()
pitch_color
argument to 'grass'
giving an impression of a real life football pitch. Note that any other color can be set, for example, 'black'
or any color represented by its hex code. Discarding the stripe
argument removes the darker stripes that appear on the pitch. The line_color
is self-explanatory and the user can change its color too according to their need. By default, the axis, labels and the ticks representing the scales are switched off. The user can turn it on by setting label
, axis
and tick
arguments to be True
, as evident in the above pitch. Let us draw a different pitch with its color changed and stripes removed.pitch = Pitch(pitch_color='black', line_color = 'white', constrained_layout = True,
tight_layout = False, goal_type = 'box', label = True, axis = True, tick = True)
fig, ax = pitch.draw()
plt.show()
Now let us focus on the axis range for a moment. By default the Pitch()
function sets the pitch type to be statsbomb
where the y-axis is inverted and ranges from 80
to 0
. The x-axis ranges from 0
to 120
. We will be mostly working with statsbomb data, so, these orientations of the axes won't be of much concern. Nevertheless this information is way too useful and we must keep this in mind, in case we deal with football data from other sources.
To be precise, there are eight different pitch types that mplsoccer
provides us with. They are 'statsbomb'
, 'opta'
, 'tracab'
, 'skillcorner'
, 'wyscout'
,'metricasports'
, 'uefa'
, and 'custom'
. This can be set using the pitch_type
argument inside the Pitch()
function. Let us check the orientation of the uefa
pitch type:
pitch = Pitch(pitch_color='grass', stripe = True, pitch_type = 'uefa', line_color = 'white', constrained_layout = True,
tight_layout = False, goal_type = 'box', label = True, axis = True, tick = True)
fig, ax = pitch.draw()
plt.show()
orientation
and set it to 'vertical'
.pitch = Pitch(orientation = 'vertical', pitch_color = 'grass', line_color = 'white', stripe = True, constrained_layout = True,
tight_layout = False, goal_type = 'box')
fig, ax = pitch.draw()
plt.show()
view
argument to be 'half'
.pitch = Pitch(view = 'half', pitch_color = 'grass', line_color = 'white', stripe = True, constrained_layout = True,
tight_layout = False, goal_type = 'box')
fig, ax = pitch.draw()
plt.show()
mplsoccer
. The pitches can be further customized to meet the users' visualization needs. Keep an eye on the mplsoccer
documentation to learn more about the same. In the next section, we will learn how to visualize a pass network for a particular team from a match and analyze the network with the help of NetworkX Python package. This package will help us use basic concepts from complex network analysis literature to analyze the network and deduce some interesting properties from the same.pip install networkx
networkx
:import networkx as nx
pip
install the seaborn
package which is a Python package built on matplotlib
and is used for generating informative and appealing statistical graphs for analysis purposes. pip install seaborn
seaborn
tooimport seaborn as sns
events
dataset, we notice that there is a column named tactics
that provides us with team lineups, formations, player ids and their jersey number from both the teams. The corresponding row values for column type
gives us an idea about whether it was the starting 11 formation or was a tactical shift or any other developments in the teams. Let us generate a completely new dataset only focusing on the tactics
and the type
columns. We will filter the data in such a way that the tactics
column has no rows set to nan
.tact = events[events['tactics'].isnull() == False]
tact = tact[['tactics', 'team', 'type']]
tact
dataset looks like:tact
tactics | team | type | |
---|---|---|---|
0 | {'formation': 41212, 'lineup': [{'player': {'i... | Real Madrid | Starting XI |
1 | {'formation': 433, 'lineup': [{'player': {'id'... | Liverpool | Starting XI |
3489 | {'formation': 433, 'lineup': [{'player': {'id'... | Liverpool | Tactical Shift |
3490 | {'formation': 433, 'lineup': [{'player': {'id'... | Real Madrid | Tactical Shift |
3491 | {'formation': 433, 'lineup': [{'player': {'id'... | Real Madrid | Tactical Shift |
type
column in tact
, we see that they are set as 'Starting XI'
, one for each team. Let us separately fetch the data for the teams, filtering by type
tact = tact[tact['type'] == 'Starting XI']
tact_Real = tact[tact['team'] == 'Real Madrid']
tact_Liv = tact[tact['team'] == 'Liverpool']
tact_Real = tact_Real['tactics']
tact_Liv = tact_Liv['tactics']
tact_Real
and tact_Liv
are dataframes made of single rows with their indices (Which we will use to extract the data), and the tactics
column is made up of a Python dict
object. For now we are only interested in the key 'lineup'
to get all the information about the players from the teams. dict_Real = tact_Real[0]['lineup']
dict_Liv = tact_Liv[1]['lineup']
from_dict()
function provided by pandas
to convert the dictionary into a dataframe.lineup_Real = pd.DataFrame.from_dict(dict_Real)
lineup_Real
player | position | jersey_number | |
---|---|---|---|
0 | {'id': 5597, 'name': 'Keylor Navas Gamboa'} | {'id': 1, 'name': 'Goalkeeper'} | 1 |
1 | {'id': 5721, 'name': 'Daniel Carvajal Ramos'} | {'id': 2, 'name': 'Right Back'} | 2 |
2 | {'id': 5485, 'name': 'Raphaël Varane'} | {'id': 3, 'name': 'Right Center Back'} | 5 |
3 | {'id': 5201, 'name': 'Sergio Ramos García'} | {'id': 5, 'name': 'Left Center Back'} | 4 |
4 | {'id': 5552, 'name': 'Marcelo Vieira da Silva ... | {'id': 6, 'name': 'Left Back'} | 12 |
5 | {'id': 5539, 'name': 'Carlos Henrique Casimiro'} | {'id': 10, 'name': 'Center Defensive Midfield'} | 14 |
6 | {'id': 5463, 'name': 'Luka Modrić'} | {'id': 13, 'name': 'Right Center Midfield'} | 10 |
7 | {'id': 5574, 'name': 'Toni Kroos'} | {'id': 15, 'name': 'Left Center Midfield'} | 8 |
8 | {'id': 4926, 'name': 'Francisco Román Alarcón ... | {'id': 19, 'name': 'Center Attacking Midfield'} | 22 |
9 | {'id': 19677, 'name': 'Karim Benzema'} | {'id': 22, 'name': 'Right Center Forward'} | 9 |
10 | {'id': 5207, 'name': 'Cristiano Ronaldo dos Sa... | {'id': 24, 'name': 'Left Center Forward'} | 7 |
lineup_Liv = pd.DataFrame.from_dict(dict_Liv)
lineup_Liv
player | position | jersey_number | |
---|---|---|---|
0 | {'id': 3630, 'name': 'Loris Karius'} | {'id': 1, 'name': 'Goalkeeper'} | 1 |
1 | {'id': 3664, 'name': 'Trent Alexander-Arnold'} | {'id': 2, 'name': 'Right Back'} | 66 |
2 | {'id': 3471, 'name': 'Dejan Lovren'} | {'id': 3, 'name': 'Right Center Back'} | 6 |
3 | {'id': 3669, 'name': 'Virgil van Dijk'} | {'id': 5, 'name': 'Left Center Back'} | 4 |
4 | {'id': 3655, 'name': 'Andrew Robertson'} | {'id': 6, 'name': 'Left Back'} | 26 |
5 | {'id': 3532, 'name': 'Jordan Brian Henderson'} | {'id': 10, 'name': 'Center Defensive Midfield'} | 14 |
6 | {'id': 3567, 'name': 'Georginio Wijnaldum'} | {'id': 13, 'name': 'Right Center Midfield'} | 5 |
7 | {'id': 3473, 'name': 'James Philip Milner'} | {'id': 15, 'name': 'Left Center Midfield'} | 7 |
8 | {'id': 3531, 'name': 'Mohamed Salah'} | {'id': 17, 'name': 'Right Wing'} | 11 |
9 | {'id': 3629, 'name': 'Sadio Mané'} | {'id': 21, 'name': 'Left Wing'} | 19 |
10 | {'id': 3535, 'name': 'Roberto Firmino Barbosa ... | {'id': 23, 'name': 'Center Forward'} | 9 |
players_Real = {}
for i in range(len(lineup_Real)):
key = lineup_Real.player[i]['name']
val = lineup_Real.jersey_number[i]
players_Real[key] = str(val)
print(players_Real)
{'Keylor Navas Gamboa': '1', 'Daniel Carvajal Ramos': '2', 'Raphaël Varane': '5', 'Sergio Ramos García': '4', 'Marcelo Vieira da Silva Júnior': '12', 'Carlos Henrique Casimiro': '14', 'Luka Modrić': '10', 'Toni Kroos': '8', 'Francisco Román Alarcón Suárez': '22', 'Karim Benzema': '9', 'Cristiano Ronaldo dos Santos Aveiro': '7'}
players_Liv = {}
for i in range(len(lineup_Liv)):
key = lineup_Liv.player[i]['name']
val = lineup_Liv.jersey_number[i]
players_Liv[key] = str(val)
print(players_Liv)
{'Loris Karius': '1', 'Trent Alexander-Arnold': '66', 'Dejan Lovren': '6', 'Virgil van Dijk': '4', 'Andrew Robertson': '26', 'Jordan Brian Henderson': '14', 'Georginio Wijnaldum': '5', 'James Philip Milner': '7', 'Mohamed Salah': '11', 'Sadio Mané': '19', 'Roberto Firmino Barbosa de Oliveira': '9'}
So, we have collected the names and the jersey number of the players (starting 11) from both the teams in separate dictionaries named players_Real
and players_Liv
. These will come handy later!
Now from the events
dataset we will extract out the relevant columns for our pass network analysis purposes.
events_pn = events[['minute', 'second', 'team', 'type', 'location', 'pass_end_location', 'pass_outcome', 'player']]
events_pn
dataframe:events_pn.head(10)
minute | second | team | type | location | pass_end_location | pass_outcome | player | |
---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | Real Madrid | Starting XI | NaN | NaN | NaN | NaN |
1 | 0 | 0 | Liverpool | Starting XI | NaN | NaN | NaN | NaN |
2 | 0 | 0 | Real Madrid | Half Start | NaN | NaN | NaN | NaN |
3 | 0 | 0 | Liverpool | Half Start | NaN | NaN | NaN | NaN |
4 | 45 | 0 | Liverpool | Half Start | NaN | NaN | NaN | NaN |
5 | 45 | 0 | Real Madrid | Half Start | NaN | NaN | NaN | NaN |
6 | 0 | 0 | Liverpool | Pass | [60.0, 40.0] | [32.1, 41.2] | NaN | James Philip Milner |
7 | 0 | 3 | Liverpool | Pass | [35.0, 40.8] | [92.7, 22.7] | Incomplete | Dejan Lovren |
8 | 0 | 8 | Real Madrid | Pass | [27.4, 60.2] | [36.1, 71.6] | NaN | Raphaël Varane |
9 | 0 | 10 | Real Madrid | Pass | [35.3, 75.4] | [22.4, 76.6] | NaN | Luka Modrić |
events_pn
dataframe:events_pn.tail(10)
minute | second | team | type | location | pass_end_location | pass_outcome | player | |
---|---|---|---|---|---|---|---|---|
3487 | 82 | 27 | Liverpool | Substitution | NaN | NaN | NaN | James Philip Milner |
3488 | 88 | 21 | Real Madrid | Substitution | NaN | NaN | NaN | Karim Benzema |
3489 | 31 | 41 | Liverpool | Tactical Shift | NaN | NaN | NaN | NaN |
3490 | 61 | 1 | Real Madrid | Tactical Shift | NaN | NaN | NaN | NaN |
3491 | 88 | 34 | Real Madrid | Tactical Shift | NaN | NaN | NaN | NaN |
3492 | 42 | 21 | Real Madrid | Offside | [114.8, 41.4] | NaN | NaN | Karim Benzema |
3493 | 48 | 31 | Real Madrid | Half End | NaN | NaN | NaN | NaN |
3494 | 48 | 31 | Liverpool | Half End | NaN | NaN | NaN | NaN |
3495 | 93 | 2 | Liverpool | Half End | NaN | NaN | NaN | NaN |
3496 | 93 | 2 | Real Madrid | Half End | NaN | NaN | NaN | NaN |
events_Real = events_pn[events_pn['team'] == 'Real Madrid']
events_Liv = events_pn[events_pn['team'] == 'Liverpool']
View the first 10 rows from both the datasets:
events_Real.head(10)
minute | second | team | type | location | pass_end_location | pass_outcome | player | |
---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | Real Madrid | Starting XI | NaN | NaN | NaN | NaN |
2 | 0 | 0 | Real Madrid | Half Start | NaN | NaN | NaN | NaN |
5 | 45 | 0 | Real Madrid | Half Start | NaN | NaN | NaN | NaN |
8 | 0 | 8 | Real Madrid | Pass | [27.4, 60.2] | [36.1, 71.6] | NaN | Raphaël Varane |
9 | 0 | 10 | Real Madrid | Pass | [35.3, 75.4] | [22.4, 76.6] | NaN | Luka Modrić |
10 | 0 | 11 | Real Madrid | Pass | [22.3, 76.6] | [33.4, 68.0] | NaN | Daniel Carvajal Ramos |
11 | 0 | 15 | Real Madrid | Pass | [36.2, 75.3] | [43.6, 62.0] | Incomplete | Carlos Henrique Casimiro |
16 | 0 | 25 | Real Madrid | Pass | [14.7, 23.2] | [56.7, 6.2] | Incomplete | Sergio Ramos García |
17 | 0 | 40 | Real Madrid | Pass | [57.5, 4.6] | [49.2, 15.6] | NaN | Marcelo Vieira da Silva Júnior |
18 | 0 | 43 | Real Madrid | Pass | [48.8, 18.4] | [49.8, 12.5] | NaN | Carlos Henrique Casimiro |
events_Liv.head(10)
minute | second | team | type | location | pass_end_location | pass_outcome | player | |
---|---|---|---|---|---|---|---|---|
1 | 0 | 0 | Liverpool | Starting XI | NaN | NaN | NaN | NaN |
3 | 0 | 0 | Liverpool | Half Start | NaN | NaN | NaN | NaN |
4 | 45 | 0 | Liverpool | Half Start | NaN | NaN | NaN | NaN |
6 | 0 | 0 | Liverpool | Pass | [60.0, 40.0] | [32.1, 41.2] | NaN | James Philip Milner |
7 | 0 | 3 | Liverpool | Pass | [35.0, 40.8] | [92.7, 22.7] | Incomplete | Dejan Lovren |
12 | 0 | 16 | Liverpool | Pass | [76.5, 18.1] | [84.8, 9.5] | NaN | Jordan Brian Henderson |
13 | 0 | 18 | Liverpool | Pass | [84.4, 10.0] | [92.5, 19.1] | NaN | Sadio Mané |
14 | 0 | 19 | Liverpool | Pass | [91.6, 21.3] | [90.6, 50.7] | NaN | Roberto Firmino Barbosa de Oliveira |
15 | 0 | 22 | Liverpool | Pass | [92.2, 50.9] | [109.7, 46.4] | Incomplete | Mohamed Salah |
25 | 1 | 7 | Liverpool | Pass | [42.0, 75.9] | [115.6, 59.3] | Incomplete | Trent Alexander-Arnold |
type
is set to Pass
.events_pn_Real = events_Real[events_Real['type'] == 'Pass']
events_pn_Liv = events_Liv[events_Liv['type'] == 'Pass']
events_pn_Real.head(10)
minute | second | team | type | location | pass_end_location | pass_outcome | player | |
---|---|---|---|---|---|---|---|---|
8 | 0 | 8 | Real Madrid | Pass | [27.4, 60.2] | [36.1, 71.6] | NaN | Raphaël Varane |
9 | 0 | 10 | Real Madrid | Pass | [35.3, 75.4] | [22.4, 76.6] | NaN | Luka Modrić |
10 | 0 | 11 | Real Madrid | Pass | [22.3, 76.6] | [33.4, 68.0] | NaN | Daniel Carvajal Ramos |
11 | 0 | 15 | Real Madrid | Pass | [36.2, 75.3] | [43.6, 62.0] | Incomplete | Carlos Henrique Casimiro |
16 | 0 | 25 | Real Madrid | Pass | [14.7, 23.2] | [56.7, 6.2] | Incomplete | Sergio Ramos García |
17 | 0 | 40 | Real Madrid | Pass | [57.5, 4.6] | [49.2, 15.6] | NaN | Marcelo Vieira da Silva Júnior |
18 | 0 | 43 | Real Madrid | Pass | [48.8, 18.4] | [49.8, 12.5] | NaN | Carlos Henrique Casimiro |
19 | 0 | 46 | Real Madrid | Pass | [48.8, 13.9] | [36.1, 56.3] | NaN | Toni Kroos |
20 | 0 | 52 | Real Madrid | Pass | [41.3, 54.8] | [34.4, 40.2] | NaN | Raphaël Varane |
21 | 0 | 55 | Real Madrid | Pass | [39.1, 36.5] | [65.4, 13.1] | NaN | Sergio Ramos García |
events_pn_Liv.head(10)
minute | second | team | type | location | pass_end_location | pass_outcome | player | |
---|---|---|---|---|---|---|---|---|
6 | 0 | 0 | Liverpool | Pass | [60.0, 40.0] | [32.1, 41.2] | NaN | James Philip Milner |
7 | 0 | 3 | Liverpool | Pass | [35.0, 40.8] | [92.7, 22.7] | Incomplete | Dejan Lovren |
12 | 0 | 16 | Liverpool | Pass | [76.5, 18.1] | [84.8, 9.5] | NaN | Jordan Brian Henderson |
13 | 0 | 18 | Liverpool | Pass | [84.4, 10.0] | [92.5, 19.1] | NaN | Sadio Mané |
14 | 0 | 19 | Liverpool | Pass | [91.6, 21.3] | [90.6, 50.7] | NaN | Roberto Firmino Barbosa de Oliveira |
15 | 0 | 22 | Liverpool | Pass | [92.2, 50.9] | [109.7, 46.4] | Incomplete | Mohamed Salah |
25 | 1 | 7 | Liverpool | Pass | [42.0, 75.9] | [115.6, 59.3] | Incomplete | Trent Alexander-Arnold |
37 | 2 | 0 | Liverpool | Pass | [9.9, 39.1] | [28.1, 4.2] | NaN | Virgil van Dijk |
38 | 2 | 3 | Liverpool | Pass | [43.2, 2.8] | [50.1, 4.8] | Incomplete | Andrew Robertson |
39 | 2 | 7 | Liverpool | Pass | [53.2, 0.1] | [50.0, 4.0] | NaN | Andrew Robertson |
events_pn_Real
dataset, we are focusing on the second and the third row (index 1
and 2
). Luka Modrić
makes the pass at around 0
th minute
and 10
th second
(Second row) and Daniel Carvajal Ramos
receives the pass at around 0
th minute
and 11
th second
(third row). So in both the datasets we need to add two extra columns named as pass_maker
and pass_receiver
, where pass_maker
column would be similar to player
column and the pass_receiver
column would be the player
column whose index would be shifted by one place in the negative direction. This can be achieved by the shift()
function provided by pandas
. We will perform this operation on both events_pn_Real
and events_pn_Liv
.events_pn_Real['pass_maker'] = events_pn_Real['player']
events_pn_Real['pass_receiver'] = events_pn_Real['player'].shift(-1)
events_pn_Liv['pass_maker'] = events_pn_Liv['player']
events_pn_Liv['pass_receiver'] = events_pn_Liv['player'].shift(-1)
events_pn_Real.head(10)
minute | second | team | type | location | pass_end_location | pass_outcome | player | pass_maker | pass_receiver | |
---|---|---|---|---|---|---|---|---|---|---|
8 | 0 | 8 | Real Madrid | Pass | [27.4, 60.2] | [36.1, 71.6] | NaN | Raphaël Varane | Raphaël Varane | Luka Modrić |
9 | 0 | 10 | Real Madrid | Pass | [35.3, 75.4] | [22.4, 76.6] | NaN | Luka Modrić | Luka Modrić | Daniel Carvajal Ramos |
10 | 0 | 11 | Real Madrid | Pass | [22.3, 76.6] | [33.4, 68.0] | NaN | Daniel Carvajal Ramos | Daniel Carvajal Ramos | Carlos Henrique Casimiro |
11 | 0 | 15 | Real Madrid | Pass | [36.2, 75.3] | [43.6, 62.0] | Incomplete | Carlos Henrique Casimiro | Carlos Henrique Casimiro | Sergio Ramos García |
16 | 0 | 25 | Real Madrid | Pass | [14.7, 23.2] | [56.7, 6.2] | Incomplete | Sergio Ramos García | Sergio Ramos García | Marcelo Vieira da Silva Júnior |
17 | 0 | 40 | Real Madrid | Pass | [57.5, 4.6] | [49.2, 15.6] | NaN | Marcelo Vieira da Silva Júnior | Marcelo Vieira da Silva Júnior | Carlos Henrique Casimiro |
18 | 0 | 43 | Real Madrid | Pass | [48.8, 18.4] | [49.8, 12.5] | NaN | Carlos Henrique Casimiro | Carlos Henrique Casimiro | Toni Kroos |
19 | 0 | 46 | Real Madrid | Pass | [48.8, 13.9] | [36.1, 56.3] | NaN | Toni Kroos | Toni Kroos | Raphaël Varane |
20 | 0 | 52 | Real Madrid | Pass | [41.3, 54.8] | [34.4, 40.2] | NaN | Raphaël Varane | Raphaël Varane | Sergio Ramos García |
21 | 0 | 55 | Real Madrid | Pass | [39.1, 36.5] | [65.4, 13.1] | NaN | Sergio Ramos García | Sergio Ramos García | Cristiano Ronaldo dos Santos Aveiro |
events_pn_Liv.head(10)
minute | second | team | type | location | pass_end_location | pass_outcome | player | pass_maker | pass_receiver | |
---|---|---|---|---|---|---|---|---|---|---|
6 | 0 | 0 | Liverpool | Pass | [60.0, 40.0] | [32.1, 41.2] | NaN | James Philip Milner | James Philip Milner | Dejan Lovren |
7 | 0 | 3 | Liverpool | Pass | [35.0, 40.8] | [92.7, 22.7] | Incomplete | Dejan Lovren | Dejan Lovren | Jordan Brian Henderson |
12 | 0 | 16 | Liverpool | Pass | [76.5, 18.1] | [84.8, 9.5] | NaN | Jordan Brian Henderson | Jordan Brian Henderson | Sadio Mané |
13 | 0 | 18 | Liverpool | Pass | [84.4, 10.0] | [92.5, 19.1] | NaN | Sadio Mané | Sadio Mané | Roberto Firmino Barbosa de Oliveira |
14 | 0 | 19 | Liverpool | Pass | [91.6, 21.3] | [90.6, 50.7] | NaN | Roberto Firmino Barbosa de Oliveira | Roberto Firmino Barbosa de Oliveira | Mohamed Salah |
15 | 0 | 22 | Liverpool | Pass | [92.2, 50.9] | [109.7, 46.4] | Incomplete | Mohamed Salah | Mohamed Salah | Trent Alexander-Arnold |
25 | 1 | 7 | Liverpool | Pass | [42.0, 75.9] | [115.6, 59.3] | Incomplete | Trent Alexander-Arnold | Trent Alexander-Arnold | Virgil van Dijk |
37 | 2 | 0 | Liverpool | Pass | [9.9, 39.1] | [28.1, 4.2] | NaN | Virgil van Dijk | Virgil van Dijk | Andrew Robertson |
38 | 2 | 3 | Liverpool | Pass | [43.2, 2.8] | [50.1, 4.8] | Incomplete | Andrew Robertson | Andrew Robertson | Andrew Robertson |
39 | 2 | 7 | Liverpool | Pass | [53.2, 0.1] | [50.0, 4.0] | NaN | Andrew Robertson | Andrew Robertson | James Philip Milner |
pass_outcome
are set as nan
are actually the successful passes. We will again filter the datasets by successful passes:events_pn_Real = events_pn_Real[events_pn_Real['pass_outcome'].isnull() == True].reset_index()
events_pn_Liv = events_pn_Liv[events_pn_Liv['pass_outcome'].isnull() == True].reset_index()
events_pn_Real.head(10)
index | minute | second | team | type | location | pass_end_location | pass_outcome | player | pass_maker | pass_receiver | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 8 | 0 | 8 | Real Madrid | Pass | [27.4, 60.2] | [36.1, 71.6] | NaN | Raphaël Varane | Raphaël Varane | Luka Modrić |
1 | 9 | 0 | 10 | Real Madrid | Pass | [35.3, 75.4] | [22.4, 76.6] | NaN | Luka Modrić | Luka Modrić | Daniel Carvajal Ramos |
2 | 10 | 0 | 11 | Real Madrid | Pass | [22.3, 76.6] | [33.4, 68.0] | NaN | Daniel Carvajal Ramos | Daniel Carvajal Ramos | Carlos Henrique Casimiro |
3 | 17 | 0 | 40 | Real Madrid | Pass | [57.5, 4.6] | [49.2, 15.6] | NaN | Marcelo Vieira da Silva Júnior | Marcelo Vieira da Silva Júnior | Carlos Henrique Casimiro |
4 | 18 | 0 | 43 | Real Madrid | Pass | [48.8, 18.4] | [49.8, 12.5] | NaN | Carlos Henrique Casimiro | Carlos Henrique Casimiro | Toni Kroos |
5 | 19 | 0 | 46 | Real Madrid | Pass | [48.8, 13.9] | [36.1, 56.3] | NaN | Toni Kroos | Toni Kroos | Raphaël Varane |
6 | 20 | 0 | 52 | Real Madrid | Pass | [41.3, 54.8] | [34.4, 40.2] | NaN | Raphaël Varane | Raphaël Varane | Sergio Ramos García |
7 | 21 | 0 | 55 | Real Madrid | Pass | [39.1, 36.5] | [65.4, 13.1] | NaN | Sergio Ramos García | Sergio Ramos García | Cristiano Ronaldo dos Santos Aveiro |
8 | 22 | 0 | 58 | Real Madrid | Pass | [64.5, 11.1] | [54.2, 5.6] | NaN | Cristiano Ronaldo dos Santos Aveiro | Cristiano Ronaldo dos Santos Aveiro | Marcelo Vieira da Silva Júnior |
9 | 23 | 0 | 59 | Real Madrid | Pass | [55.3, 5.5] | [83.9, 4.3] | NaN | Marcelo Vieira da Silva Júnior | Marcelo Vieira da Silva Júnior | Karim Benzema |
events_pn_Liv.head(10)
index | minute | second | team | type | location | pass_end_location | pass_outcome | player | pass_maker | pass_receiver | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 6 | 0 | 0 | Liverpool | Pass | [60.0, 40.0] | [32.1, 41.2] | NaN | James Philip Milner | James Philip Milner | Dejan Lovren |
1 | 12 | 0 | 16 | Liverpool | Pass | [76.5, 18.1] | [84.8, 9.5] | NaN | Jordan Brian Henderson | Jordan Brian Henderson | Sadio Mané |
2 | 13 | 0 | 18 | Liverpool | Pass | [84.4, 10.0] | [92.5, 19.1] | NaN | Sadio Mané | Sadio Mané | Roberto Firmino Barbosa de Oliveira |
3 | 14 | 0 | 19 | Liverpool | Pass | [91.6, 21.3] | [90.6, 50.7] | NaN | Roberto Firmino Barbosa de Oliveira | Roberto Firmino Barbosa de Oliveira | Mohamed Salah |
4 | 37 | 2 | 0 | Liverpool | Pass | [9.9, 39.1] | [28.1, 4.2] | NaN | Virgil van Dijk | Virgil van Dijk | Andrew Robertson |
5 | 39 | 2 | 7 | Liverpool | Pass | [53.2, 0.1] | [50.0, 4.0] | NaN | Andrew Robertson | Andrew Robertson | James Philip Milner |
6 | 40 | 2 | 10 | Liverpool | Pass | [45.5, 4.0] | [27.4, 16.8] | NaN | James Philip Milner | James Philip Milner | Virgil van Dijk |
7 | 41 | 2 | 13 | Liverpool | Pass | [26.7, 19.6] | [27.8, 47.3] | NaN | Virgil van Dijk | Virgil van Dijk | Dejan Lovren |
8 | 42 | 2 | 16 | Liverpool | Pass | [28.0, 45.4] | [28.4, 21.4] | NaN | Dejan Lovren | Dejan Lovren | Virgil van Dijk |
9 | 43 | 2 | 19 | Liverpool | Pass | [30.4, 25.7] | [30.7, 52.9] | NaN | Virgil van Dijk | Virgil van Dijk | Dejan Lovren |
So it seems we have been able to logically clean and modify the datasets. Now we are only focused on building the pass network among the players who were in the starting 11 from both the teams. So we will discard out the rows which consist of pass events that took place after the first substitution for either of the teams. Let us find the minute
and second
of the first substitution for both Real Madrid
and Liverpool
.
Now, let us filter the datasets events_Real
and events_Liv
by setting the type
to be Substitution
. This will give us the information of when the first substitution had taken place for the teams.
substitution_Real = events_Real[events_Real['type'] == 'Substitution']
substitution_Liv = events_Liv[events_Liv['type'] == 'Substitution']
substitution_Real
minute | second | team | type | location | pass_end_location | pass_outcome | player | |
---|---|---|---|---|---|---|---|---|
3485 | 36 | 17 | Real Madrid | Substitution | NaN | NaN | NaN | Daniel Carvajal Ramos |
3486 | 60 | 56 | Real Madrid | Substitution | NaN | NaN | NaN | Francisco Román Alarcón Suárez |
3488 | 88 | 21 | Real Madrid | Substitution | NaN | NaN | NaN | Karim Benzema |
substitution_Liv
minute | second | team | type | location | pass_end_location | pass_outcome | player | |
---|---|---|---|---|---|---|---|---|
3484 | 29 | 39 | Liverpool | Substitution | NaN | NaN | NaN | Mohamed Salah |
3487 | 82 | 27 | Liverpool | Substitution | NaN | NaN | NaN | James Philip Milner |
Real Madrid
at the 36
th minute and 17
th second, whereas for Liverpool
it takes place around 29
th minute and 39
th second. Let us find these out by writing a small Python code:substitution_Real_minute = np.min(substitution_Real['minute'])
substitution_Real_minute_data = substitution_Real[substitution_Real['minute'] == substitution_Real_minute]
substitution_Real_second = np.min(substitution_Real_minute_data['second'])
print("minute =", substitution_Real_minute, "second =", substitution_Real_second)
minute = 36 second = 17
substitution_Liv_minute = np.min(substitution_Liv['minute'])
substitution_Liv_minute_data = substitution_Liv[substitution_Liv['minute'] == substitution_Liv_minute]
substitution_Liv_second = np.min(substitution_Liv_minute_data['second'])
print("minute = ", substitution_Liv_minute, "second = ", substitution_Liv_second)
minute = 29 second = 39
events_pn_Real = events_pn_Real[(events_pn_Real['minute'] <= substitution_Real_minute)]
events_pn_Liv = events_pn_Liv[(events_pn_Liv['minute'] <= substitution_Liv_minute)]
events_pn_Real.head(10)
index | minute | second | team | type | location | pass_end_location | pass_outcome | player | pass_maker | pass_receiver | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 8 | 0 | 8 | Real Madrid | Pass | [27.4, 60.2] | [36.1, 71.6] | NaN | Raphaël Varane | Raphaël Varane | Luka Modrić |
1 | 9 | 0 | 10 | Real Madrid | Pass | [35.3, 75.4] | [22.4, 76.6] | NaN | Luka Modrić | Luka Modrić | Daniel Carvajal Ramos |
2 | 10 | 0 | 11 | Real Madrid | Pass | [22.3, 76.6] | [33.4, 68.0] | NaN | Daniel Carvajal Ramos | Daniel Carvajal Ramos | Carlos Henrique Casimiro |
3 | 17 | 0 | 40 | Real Madrid | Pass | [57.5, 4.6] | [49.2, 15.6] | NaN | Marcelo Vieira da Silva Júnior | Marcelo Vieira da Silva Júnior | Carlos Henrique Casimiro |
4 | 18 | 0 | 43 | Real Madrid | Pass | [48.8, 18.4] | [49.8, 12.5] | NaN | Carlos Henrique Casimiro | Carlos Henrique Casimiro | Toni Kroos |
5 | 19 | 0 | 46 | Real Madrid | Pass | [48.8, 13.9] | [36.1, 56.3] | NaN | Toni Kroos | Toni Kroos | Raphaël Varane |
6 | 20 | 0 | 52 | Real Madrid | Pass | [41.3, 54.8] | [34.4, 40.2] | NaN | Raphaël Varane | Raphaël Varane | Sergio Ramos García |
7 | 21 | 0 | 55 | Real Madrid | Pass | [39.1, 36.5] | [65.4, 13.1] | NaN | Sergio Ramos García | Sergio Ramos García | Cristiano Ronaldo dos Santos Aveiro |
8 | 22 | 0 | 58 | Real Madrid | Pass | [64.5, 11.1] | [54.2, 5.6] | NaN | Cristiano Ronaldo dos Santos Aveiro | Cristiano Ronaldo dos Santos Aveiro | Marcelo Vieira da Silva Júnior |
9 | 23 | 0 | 59 | Real Madrid | Pass | [55.3, 5.5] | [83.9, 4.3] | NaN | Marcelo Vieira da Silva Júnior | Marcelo Vieira da Silva Júnior | Karim Benzema |
events_pn_Liv.head(10)
index | minute | second | team | type | location | pass_end_location | pass_outcome | player | pass_maker | pass_receiver | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 6 | 0 | 0 | Liverpool | Pass | [60.0, 40.0] | [32.1, 41.2] | NaN | James Philip Milner | James Philip Milner | Dejan Lovren |
1 | 12 | 0 | 16 | Liverpool | Pass | [76.5, 18.1] | [84.8, 9.5] | NaN | Jordan Brian Henderson | Jordan Brian Henderson | Sadio Mané |
2 | 13 | 0 | 18 | Liverpool | Pass | [84.4, 10.0] | [92.5, 19.1] | NaN | Sadio Mané | Sadio Mané | Roberto Firmino Barbosa de Oliveira |
3 | 14 | 0 | 19 | Liverpool | Pass | [91.6, 21.3] | [90.6, 50.7] | NaN | Roberto Firmino Barbosa de Oliveira | Roberto Firmino Barbosa de Oliveira | Mohamed Salah |
4 | 37 | 2 | 0 | Liverpool | Pass | [9.9, 39.1] | [28.1, 4.2] | NaN | Virgil van Dijk | Virgil van Dijk | Andrew Robertson |
5 | 39 | 2 | 7 | Liverpool | Pass | [53.2, 0.1] | [50.0, 4.0] | NaN | Andrew Robertson | Andrew Robertson | James Philip Milner |
6 | 40 | 2 | 10 | Liverpool | Pass | [45.5, 4.0] | [27.4, 16.8] | NaN | James Philip Milner | James Philip Milner | Virgil van Dijk |
7 | 41 | 2 | 13 | Liverpool | Pass | [26.7, 19.6] | [27.8, 47.3] | NaN | Virgil van Dijk | Virgil van Dijk | Dejan Lovren |
8 | 42 | 2 | 16 | Liverpool | Pass | [28.0, 45.4] | [28.4, 21.4] | NaN | Dejan Lovren | Dejan Lovren | Virgil van Dijk |
9 | 43 | 2 | 19 | Liverpool | Pass | [30.4, 25.7] | [30.7, 52.9] | NaN | Virgil van Dijk | Virgil van Dijk | Dejan Lovren |
Now from the datasets, we will split the location
and the pass_end_location
columns into two columns each representing the coordinates and name them as pass_maker_x
, pass_maker_y
, pass_receiver_x
and pass_receiver_y
.
Let us manipulate the dataset for Real Madrid
first:
Loc = events_pn_Real['location']
Loc = pd.DataFrame(Loc.to_list(), columns=['pass_maker_x', 'pass_maker_y'])
Loc_end = events_pn_Real['pass_end_location']
Loc_end = pd.DataFrame(Loc_end.to_list(), columns=['pass_receiver_x', 'pass_receiver_y'])
events_pn_Real['pass_maker_x'] = Loc['pass_maker_x']
events_pn_Real['pass_maker_y'] = Loc['pass_maker_y']
events_pn_Real['pass_receiver_x'] = Loc_end['pass_receiver_x']
events_pn_Real['pass_receiver_y'] = Loc_end['pass_receiver_y']
events_pn_Real = events_pn_Real[['index', 'minute', 'second', 'team', 'type', 'pass_outcome',
'player', 'pass_maker', 'pass_receiver', 'pass_maker_x',
'pass_maker_y', 'pass_receiver_x', 'pass_receiver_y']]
events_pn_Real.head(10)
index | minute | second | team | type | pass_outcome | player | pass_maker | pass_receiver | pass_maker_x | pass_maker_y | pass_receiver_x | pass_receiver_y | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 8 | 0 | 8 | Real Madrid | Pass | NaN | Raphaël Varane | Raphaël Varane | Luka Modrić | 27.4 | 60.2 | 36.1 | 71.6 |
1 | 9 | 0 | 10 | Real Madrid | Pass | NaN | Luka Modrić | Luka Modrić | Daniel Carvajal Ramos | 35.3 | 75.4 | 22.4 | 76.6 |
2 | 10 | 0 | 11 | Real Madrid | Pass | NaN | Daniel Carvajal Ramos | Daniel Carvajal Ramos | Carlos Henrique Casimiro | 22.3 | 76.6 | 33.4 | 68.0 |
3 | 17 | 0 | 40 | Real Madrid | Pass | NaN | Marcelo Vieira da Silva Júnior | Marcelo Vieira da Silva Júnior | Carlos Henrique Casimiro | 57.5 | 4.6 | 49.2 | 15.6 |
4 | 18 | 0 | 43 | Real Madrid | Pass | NaN | Carlos Henrique Casimiro | Carlos Henrique Casimiro | Toni Kroos | 48.8 | 18.4 | 49.8 | 12.5 |
5 | 19 | 0 | 46 | Real Madrid | Pass | NaN | Toni Kroos | Toni Kroos | Raphaël Varane | 48.8 | 13.9 | 36.1 | 56.3 |
6 | 20 | 0 | 52 | Real Madrid | Pass | NaN | Raphaël Varane | Raphaël Varane | Sergio Ramos García | 41.3 | 54.8 | 34.4 | 40.2 |
7 | 21 | 0 | 55 | Real Madrid | Pass | NaN | Sergio Ramos García | Sergio Ramos García | Cristiano Ronaldo dos Santos Aveiro | 39.1 | 36.5 | 65.4 | 13.1 |
8 | 22 | 0 | 58 | Real Madrid | Pass | NaN | Cristiano Ronaldo dos Santos Aveiro | Cristiano Ronaldo dos Santos Aveiro | Marcelo Vieira da Silva Júnior | 64.5 | 11.1 | 54.2 | 5.6 |
9 | 23 | 0 | 59 | Real Madrid | Pass | NaN | Marcelo Vieira da Silva Júnior | Marcelo Vieira da Silva Júnior | Karim Benzema | 55.3 | 5.5 | 83.9 | 4.3 |
Loc = events_pn_Liv['location']
Loc = pd.DataFrame(Loc.to_list(), columns=['pass_maker_x', 'pass_maker_y'])
Loc_end = events_pn_Liv['pass_end_location']
Loc_end = pd.DataFrame(Loc_end.to_list(), columns=['pass_receiver_x', 'pass_receiver_y'])
events_pn_Liv['pass_maker_x'] = Loc['pass_maker_x']
events_pn_Liv['pass_maker_y'] = Loc['pass_maker_y']
events_pn_Liv['pass_receiver_x'] = Loc_end['pass_receiver_x']
events_pn_Liv['pass_receiver_y'] = Loc_end['pass_receiver_y']
events_pn_Liv = events_pn_Liv[['index', 'minute', 'second', 'team', 'type', 'pass_outcome',
'player', 'pass_maker', 'pass_receiver', 'pass_maker_x',
'pass_maker_y', 'pass_receiver_x', 'pass_receiver_y']]
events_pn_Liv.head(10)
index | minute | second | team | type | pass_outcome | player | pass_maker | pass_receiver | pass_maker_x | pass_maker_y | pass_receiver_x | pass_receiver_y | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 6 | 0 | 0 | Liverpool | Pass | NaN | James Philip Milner | James Philip Milner | Dejan Lovren | 60.0 | 40.0 | 32.1 | 41.2 |
1 | 12 | 0 | 16 | Liverpool | Pass | NaN | Jordan Brian Henderson | Jordan Brian Henderson | Sadio Mané | 76.5 | 18.1 | 84.8 | 9.5 |
2 | 13 | 0 | 18 | Liverpool | Pass | NaN | Sadio Mané | Sadio Mané | Roberto Firmino Barbosa de Oliveira | 84.4 | 10.0 | 92.5 | 19.1 |
3 | 14 | 0 | 19 | Liverpool | Pass | NaN | Roberto Firmino Barbosa de Oliveira | Roberto Firmino Barbosa de Oliveira | Mohamed Salah | 91.6 | 21.3 | 90.6 | 50.7 |
4 | 37 | 2 | 0 | Liverpool | Pass | NaN | Virgil van Dijk | Virgil van Dijk | Andrew Robertson | 9.9 | 39.1 | 28.1 | 4.2 |
5 | 39 | 2 | 7 | Liverpool | Pass | NaN | Andrew Robertson | Andrew Robertson | James Philip Milner | 53.2 | 0.1 | 50.0 | 4.0 |
6 | 40 | 2 | 10 | Liverpool | Pass | NaN | James Philip Milner | James Philip Milner | Virgil van Dijk | 45.5 | 4.0 | 27.4 | 16.8 |
7 | 41 | 2 | 13 | Liverpool | Pass | NaN | Virgil van Dijk | Virgil van Dijk | Dejan Lovren | 26.7 | 19.6 | 27.8 | 47.3 |
8 | 42 | 2 | 16 | Liverpool | Pass | NaN | Dejan Lovren | Dejan Lovren | Virgil van Dijk | 28.0 | 45.4 | 28.4 | 21.4 |
9 | 43 | 2 | 19 | Liverpool | Pass | NaN | Virgil van Dijk | Virgil van Dijk | Dejan Lovren | 30.4 | 25.7 | 30.7 | 52.9 |
av_loc_Real = events_pn_Real.groupby('pass_maker').agg({'pass_maker_x':['mean'],
'pass_maker_y':['mean', 'count']})
av_loc_Real
pass_maker_x | pass_maker_y | ||
---|---|---|---|
mean | mean | count | |
pass_maker | |||
Carlos Henrique Casimiro | 60.845455 | 31.836364 | 11 |
Cristiano Ronaldo dos Santos Aveiro | 81.580000 | 29.160000 | 10 |
Daniel Carvajal Ramos | 64.341667 | 73.875000 | 24 |
Francisco Román Alarcón Suárez | 62.323529 | 27.082353 | 17 |
Karim Benzema | 65.081818 | 27.936364 | 11 |
Keylor Navas Gamboa | 10.870000 | 41.810000 | 10 |
Luka Modrić | 60.604762 | 55.028571 | 21 |
Marcelo Vieira da Silva Júnior | 59.865217 | 11.130435 | 23 |
Raphaël Varane | 37.436364 | 58.354545 | 22 |
Sergio Ramos García | 41.282353 | 24.514706 | 34 |
Toni Kroos | 51.190000 | 24.275000 | 40 |
groupby()
function from pandas
splits events_pn_Real
into groups indexed by the player names. Whereas, the agg()
function aggregates the data into the averages of the pass makers' locations and also counts the number of passes made by these players. Now refine the column names of av_loc_Real
:av_loc_Real.columns = ['pass_maker_x', 'pass_maker_y', 'count']
av_loc_Real
pass_maker_x | pass_maker_y | count | |
---|---|---|---|
pass_maker | |||
Carlos Henrique Casimiro | 60.845455 | 31.836364 | 11 |
Cristiano Ronaldo dos Santos Aveiro | 81.580000 | 29.160000 | 10 |
Daniel Carvajal Ramos | 64.341667 | 73.875000 | 24 |
Francisco Román Alarcón Suárez | 62.323529 | 27.082353 | 17 |
Karim Benzema | 65.081818 | 27.936364 | 11 |
Keylor Navas Gamboa | 10.870000 | 41.810000 | 10 |
Luka Modrić | 60.604762 | 55.028571 | 21 |
Marcelo Vieira da Silva Júnior | 59.865217 | 11.130435 | 23 |
Raphaël Varane | 37.436364 | 58.354545 | 22 |
Sergio Ramos García | 41.282353 | 24.514706 | 34 |
Toni Kroos | 51.190000 | 24.275000 | 40 |
Liverpool
:av_loc_Liv = events_pn_Liv.groupby('pass_maker').agg({'pass_maker_x':['mean'],
'pass_maker_y':['mean', 'count']})
av_loc_Liv.columns = ['pass_maker_x', 'pass_maker_y', 'count']
av_loc_Liv
pass_maker_x | pass_maker_y | count | |
---|---|---|---|
pass_maker | |||
Andrew Robertson | 59.815385 | 6.830769 | 13 |
Dejan Lovren | 41.690909 | 60.172727 | 11 |
Georginio Wijnaldum | 76.390909 | 28.518182 | 11 |
James Philip Milner | 72.353333 | 36.153333 | 15 |
Jordan Brian Henderson | 61.035294 | 37.152941 | 17 |
Loris Karius | 12.914286 | 40.385714 | 7 |
Mohamed Salah | 77.550000 | 64.710000 | 10 |
Roberto Firmino Barbosa de Oliveira | 78.250000 | 43.570000 | 10 |
Sadio Mané | 86.275000 | 22.075000 | 4 |
Trent Alexander-Arnold | 64.666667 | 72.550000 | 12 |
Virgil van Dijk | 43.366667 | 25.433333 | 9 |
A
to a player B
is not identical to a pass from player B
to player A
). We will use the groupby()
and the count()
function to count the number of rows where a unique player A
passed the ball to another unique player B
.pass_Real = events_pn_Real.groupby(['pass_maker', 'pass_receiver']).index.count().reset_index()
pass_Real.head(10)
pass_maker | pass_receiver | index | |
---|---|---|---|
0 | Carlos Henrique Casimiro | Daniel Carvajal Ramos | 1 |
1 | Carlos Henrique Casimiro | Luka Modrić | 1 |
2 | Carlos Henrique Casimiro | Marcelo Vieira da Silva Júnior | 1 |
3 | Carlos Henrique Casimiro | Raphaël Varane | 1 |
4 | Carlos Henrique Casimiro | Sergio Ramos García | 1 |
5 | Carlos Henrique Casimiro | Toni Kroos | 6 |
6 | Cristiano Ronaldo dos Santos Aveiro | Daniel Carvajal Ramos | 3 |
7 | Cristiano Ronaldo dos Santos Aveiro | Karim Benzema | 1 |
8 | Cristiano Ronaldo dos Santos Aveiro | Luka Modrić | 1 |
9 | Cristiano Ronaldo dos Santos Aveiro | Marcelo Vieira da Silva Júnior | 4 |
pass_Liv = events_pn_Liv.groupby(['pass_maker', 'pass_receiver']).index.count().reset_index()
pass_Liv.head(10)
pass_maker | pass_receiver | index | |
---|---|---|---|
0 | Andrew Robertson | Andrew Robertson | 1 |
1 | Andrew Robertson | Georginio Wijnaldum | 3 |
2 | Andrew Robertson | James Philip Milner | 3 |
3 | Andrew Robertson | Jordan Brian Henderson | 2 |
4 | Andrew Robertson | Roberto Firmino Barbosa de Oliveira | 2 |
5 | Andrew Robertson | Virgil van Dijk | 2 |
6 | Dejan Lovren | James Philip Milner | 1 |
7 | Dejan Lovren | Jordan Brian Henderson | 1 |
8 | Dejan Lovren | Loris Karius | 2 |
9 | Dejan Lovren | Mohamed Salah | 1 |
index
column to number_of_passes
:pass_Real.rename(columns = {'index':'number_of_passes'}, inplace = True)
pass_Real.head(10)
pass_maker | pass_receiver | number_of_passes | |
---|---|---|---|
0 | Carlos Henrique Casimiro | Daniel Carvajal Ramos | 1 |
1 | Carlos Henrique Casimiro | Luka Modrić | 1 |
2 | Carlos Henrique Casimiro | Marcelo Vieira da Silva Júnior | 1 |
3 | Carlos Henrique Casimiro | Raphaël Varane | 1 |
4 | Carlos Henrique Casimiro | Sergio Ramos García | 1 |
5 | Carlos Henrique Casimiro | Toni Kroos | 6 |
6 | Cristiano Ronaldo dos Santos Aveiro | Daniel Carvajal Ramos | 3 |
7 | Cristiano Ronaldo dos Santos Aveiro | Karim Benzema | 1 |
8 | Cristiano Ronaldo dos Santos Aveiro | Luka Modrić | 1 |
9 | Cristiano Ronaldo dos Santos Aveiro | Marcelo Vieira da Silva Júnior | 4 |
pass_Liv.rename(columns = {'index':'number_of_passes'}, inplace = True)
pass_Liv.head(10)
pass_maker | pass_receiver | number_of_passes | |
---|---|---|---|
0 | Andrew Robertson | Andrew Robertson | 1 |
1 | Andrew Robertson | Georginio Wijnaldum | 3 |
2 | Andrew Robertson | James Philip Milner | 3 |
3 | Andrew Robertson | Jordan Brian Henderson | 2 |
4 | Andrew Robertson | Roberto Firmino Barbosa de Oliveira | 2 |
5 | Andrew Robertson | Virgil van Dijk | 2 |
6 | Dejan Lovren | James Philip Milner | 1 |
7 | Dejan Lovren | Jordan Brian Henderson | 1 |
8 | Dejan Lovren | Loris Karius | 2 |
9 | Dejan Lovren | Mohamed Salah | 1 |
av_loc_Real
and pass_Real
, Let us identify the left and the right dataframes for performing the merge. Here, av_loc_Real
is the left dataframe and pass_Real
is the right. We will use the merge()
function from pandas
to carry out the merging operation. pass_Real = pass_Real.merge(av_loc_Real, left_on = 'pass_maker', right_index = True)
pass_Real.head(10)
pass_maker | pass_receiver | number_of_passes | pass_maker_x | pass_maker_y | count | |
---|---|---|---|---|---|---|
0 | Carlos Henrique Casimiro | Daniel Carvajal Ramos | 1 | 60.845455 | 31.836364 | 11 |
1 | Carlos Henrique Casimiro | Luka Modrić | 1 | 60.845455 | 31.836364 | 11 |
2 | Carlos Henrique Casimiro | Marcelo Vieira da Silva Júnior | 1 | 60.845455 | 31.836364 | 11 |
3 | Carlos Henrique Casimiro | Raphaël Varane | 1 | 60.845455 | 31.836364 | 11 |
4 | Carlos Henrique Casimiro | Sergio Ramos García | 1 | 60.845455 | 31.836364 | 11 |
5 | Carlos Henrique Casimiro | Toni Kroos | 6 | 60.845455 | 31.836364 | 11 |
6 | Cristiano Ronaldo dos Santos Aveiro | Daniel Carvajal Ramos | 3 | 81.580000 | 29.160000 | 10 |
7 | Cristiano Ronaldo dos Santos Aveiro | Karim Benzema | 1 | 81.580000 | 29.160000 | 10 |
8 | Cristiano Ronaldo dos Santos Aveiro | Luka Modrić | 1 | 81.580000 | 29.160000 | 10 |
9 | Cristiano Ronaldo dos Santos Aveiro | Marcelo Vieira da Silva Júnior | 4 | 81.580000 | 29.160000 | 10 |
The left_on
argument specifies the column names to join our right dataframe on, and the right_index
argument decides whether to use the index from the right dataframe as the key for joining. Let us do the same operation for the other team:
pass_Liv = pass_Liv.merge(av_loc_Liv, left_on = 'pass_maker', right_index = True)
pass_Liv.head(10)
pass_maker | pass_receiver | number_of_passes | pass_maker_x | pass_maker_y | count | |
---|---|---|---|---|---|---|
0 | Andrew Robertson | Andrew Robertson | 1 | 59.815385 | 6.830769 | 13 |
1 | Andrew Robertson | Georginio Wijnaldum | 3 | 59.815385 | 6.830769 | 13 |
2 | Andrew Robertson | James Philip Milner | 3 | 59.815385 | 6.830769 | 13 |
3 | Andrew Robertson | Jordan Brian Henderson | 2 | 59.815385 | 6.830769 | 13 |
4 | Andrew Robertson | Roberto Firmino Barbosa de Oliveira | 2 | 59.815385 | 6.830769 | 13 |
5 | Andrew Robertson | Virgil van Dijk | 2 | 59.815385 | 6.830769 | 13 |
6 | Dejan Lovren | James Philip Milner | 1 | 41.690909 | 60.172727 | 11 |
7 | Dejan Lovren | Jordan Brian Henderson | 1 | 41.690909 | 60.172727 | 11 |
8 | Dejan Lovren | Loris Karius | 2 | 41.690909 | 60.172727 | 11 |
9 | Dejan Lovren | Mohamed Salah | 1 | 41.690909 | 60.172727 | 11 |
pass_Real = pass_Real.merge(av_loc_Real, left_on = 'pass_receiver',
right_index = True, suffixes = ['', '_receipt'])
pass_Real.rename(columns = {'pass_maker_x_receipt':'pass_receiver_x',
'pass_maker_y_receipt':'pass_receiver_y',
'count_receipt':'number_of_passes_received'}, inplace = True)
pass_Real = pass_Real[pass_Real['pass_maker'] != pass_Real['pass_receiver']].reset_index()
pass_Real
index | pass_maker | pass_receiver | number_of_passes | pass_maker_x | pass_maker_y | count | pass_receiver_x | pass_receiver_y | number_of_passes_received | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | Carlos Henrique Casimiro | Daniel Carvajal Ramos | 1 | 60.845455 | 31.836364 | 11 | 64.341667 | 73.875 | 24 |
1 | 6 | Cristiano Ronaldo dos Santos Aveiro | Daniel Carvajal Ramos | 3 | 81.580000 | 29.160000 | 10 | 64.341667 | 73.875 | 24 |
2 | 21 | Francisco Román Alarcón Suárez | Daniel Carvajal Ramos | 2 | 62.323529 | 27.082353 | 17 | 64.341667 | 73.875 | 24 |
3 | 29 | Karim Benzema | Daniel Carvajal Ramos | 2 | 65.081818 | 27.936364 | 11 | 64.341667 | 73.875 | 24 |
4 | 39 | Luka Modrić | Daniel Carvajal Ramos | 10 | 60.604762 | 55.028571 | 21 | 64.341667 | 73.875 | 24 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
73 | 16 | Daniel Carvajal Ramos | Keylor Navas Gamboa | 1 | 64.341667 | 73.875000 | 24 | 10.870000 | 41.810 | 10 |
74 | 30 | Karim Benzema | Keylor Navas Gamboa | 1 | 65.081818 | 27.936364 | 11 | 10.870000 | 41.810 | 10 |
75 | 57 | Raphaël Varane | Keylor Navas Gamboa | 2 | 37.436364 | 58.354545 | 22 | 10.870000 | 41.810 | 10 |
76 | 64 | Sergio Ramos García | Keylor Navas Gamboa | 1 | 41.282353 | 24.514706 | 34 | 10.870000 | 41.810 | 10 |
77 | 74 | Toni Kroos | Keylor Navas Gamboa | 1 | 51.190000 | 24.275000 | 40 | 10.870000 | 41.810 | 10 |
78 rows × 10 columns
pass_Liv = pass_Liv.merge(av_loc_Liv, left_on = 'pass_receiver',
right_index = True, suffixes = ['', '_receipt'])
pass_Liv.rename(columns = {'pass_maker_x_receipt':'pass_receiver_x',
'pass_maker_y_receipt':'pass_receiver_y',
'count_receipt':'number_of_passes_received'}, inplace = True)
pass_Liv = pass_Liv[pass_Liv['pass_maker'] != pass_Liv['pass_receiver']].reset_index()
pass_Liv
index | pass_maker | pass_receiver | number_of_passes | pass_maker_x | pass_maker_y | count | pass_receiver_x | pass_receiver_y | number_of_passes_received | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 12 | Georginio Wijnaldum | Andrew Robertson | 4 | 76.390909 | 28.518182 | 11 | 59.815385 | 6.830769 | 13 |
1 | 18 | James Philip Milner | Andrew Robertson | 1 | 72.353333 | 36.153333 | 15 | 59.815385 | 6.830769 | 13 |
2 | 28 | Jordan Brian Henderson | Andrew Robertson | 1 | 61.035294 | 37.152941 | 17 | 59.815385 | 6.830769 | 13 |
3 | 36 | Loris Karius | Andrew Robertson | 1 | 12.914286 | 40.385714 | 7 | 59.815385 | 6.830769 | 13 |
4 | 54 | Trent Alexander-Arnold | Andrew Robertson | 1 | 64.666667 | 72.550000 | 12 | 59.815385 | 6.830769 | 13 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
59 | 55 | Trent Alexander-Arnold | Dejan Lovren | 1 | 64.666667 | 72.550000 | 12 | 41.690909 | 60.172727 | 11 |
60 | 61 | Virgil van Dijk | Dejan Lovren | 3 | 43.366667 | 25.433333 | 9 | 41.690909 | 60.172727 | 11 |
61 | 25 | James Philip Milner | Sadio Mané | 2 | 72.353333 | 36.153333 | 15 | 86.275000 | 22.075000 | 4 |
62 | 33 | Jordan Brian Henderson | Sadio Mané | 1 | 61.035294 | 37.152941 | 17 | 86.275000 | 22.075000 | 4 |
63 | 43 | Mohamed Salah | Sadio Mané | 1 | 77.550000 | 64.710000 | 10 | 86.275000 | 22.075000 | 4 |
64 rows × 10 columns
pass_Real_new = pass_Real.replace({"pass_maker": players_Real, "pass_receiver": players_Real})
pass_Real_new
index | pass_maker | pass_receiver | number_of_passes | pass_maker_x | pass_maker_y | count | pass_receiver_x | pass_receiver_y | number_of_passes_received | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 14 | 2 | 1 | 60.845455 | 31.836364 | 11 | 64.341667 | 73.875 | 24 |
1 | 6 | 7 | 2 | 3 | 81.580000 | 29.160000 | 10 | 64.341667 | 73.875 | 24 |
2 | 21 | 22 | 2 | 2 | 62.323529 | 27.082353 | 17 | 64.341667 | 73.875 | 24 |
3 | 29 | 9 | 2 | 2 | 65.081818 | 27.936364 | 11 | 64.341667 | 73.875 | 24 |
4 | 39 | 10 | 2 | 10 | 60.604762 | 55.028571 | 21 | 64.341667 | 73.875 | 24 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
73 | 16 | 2 | 1 | 1 | 64.341667 | 73.875000 | 24 | 10.870000 | 41.810 | 10 |
74 | 30 | 9 | 1 | 1 | 65.081818 | 27.936364 | 11 | 10.870000 | 41.810 | 10 |
75 | 57 | 5 | 1 | 2 | 37.436364 | 58.354545 | 22 | 10.870000 | 41.810 | 10 |
76 | 64 | 4 | 1 | 1 | 41.282353 | 24.514706 | 34 | 10.870000 | 41.810 | 10 |
77 | 74 | 8 | 1 | 1 | 51.190000 | 24.275000 | 40 | 10.870000 | 41.810 | 10 |
78 rows × 10 columns
pass_Liv_new = pass_Liv.replace({"pass_maker": players_Liv, "pass_receiver": players_Liv})
pass_Liv_new
index | pass_maker | pass_receiver | number_of_passes | pass_maker_x | pass_maker_y | count | pass_receiver_x | pass_receiver_y | number_of_passes_received | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 12 | 5 | 26 | 4 | 76.390909 | 28.518182 | 11 | 59.815385 | 6.830769 | 13 |
1 | 18 | 7 | 26 | 1 | 72.353333 | 36.153333 | 15 | 59.815385 | 6.830769 | 13 |
2 | 28 | 14 | 26 | 1 | 61.035294 | 37.152941 | 17 | 59.815385 | 6.830769 | 13 |
3 | 36 | 1 | 26 | 1 | 12.914286 | 40.385714 | 7 | 59.815385 | 6.830769 | 13 |
4 | 54 | 66 | 26 | 1 | 64.666667 | 72.550000 | 12 | 59.815385 | 6.830769 | 13 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
59 | 55 | 66 | 6 | 1 | 64.666667 | 72.550000 | 12 | 41.690909 | 60.172727 | 11 |
60 | 61 | 4 | 6 | 3 | 43.366667 | 25.433333 | 9 | 41.690909 | 60.172727 | 11 |
61 | 25 | 7 | 19 | 2 | 72.353333 | 36.153333 | 15 | 86.275000 | 22.075000 | 4 |
62 | 33 | 14 | 19 | 1 | 61.035294 | 37.152941 | 17 | 86.275000 | 22.075000 | 4 |
63 | 43 | 11 | 19 | 1 | 77.550000 | 64.710000 | 10 | 86.275000 | 22.075000 | 4 |
64 rows × 10 columns
pitch = Pitch(pitch_color='grass', goal_type = 'box', line_color='white', stripe = True,
constrained_layout=True, tight_layout=False)
fig, ax = pitch.draw()
arrows = pitch.arrows(pass_Real.pass_maker_x, pass_Real.pass_maker_y,
pass_Real.pass_receiver_x, pass_Real.pass_receiver_y, lw = 5,
color = 'black', zorder = 1, ax=ax)
nodes = pitch.scatter(av_loc_Real.pass_maker_x, av_loc_Real.pass_maker_y,
s=350, color = 'white', edgecolors='black', linewidth=1, alpha = 1, ax = ax)
for index, row in av_loc_Real.iterrows():
pitch.annotate(players_Real[row.name], xy=(row.pass_maker_x, row.pass_maker_y),
c ='black', va = 'center', ha = 'center', size = 10, ax = ax)
plt.title("Pass network for Real Madrid against Liverpool", size = 20)
plt.show()
pitch = Pitch(pitch_color='grass', goal_type = 'box', stripe = True,
line_color='white', constrained_layout=True, tight_layout=False)
fig, ax = pitch.draw()
arrows = pitch.arrows(120 - pass_Liv.pass_maker_x, pass_Liv.pass_maker_y,
120 - pass_Liv.pass_receiver_x, pass_Liv.pass_receiver_y, lw = 5,
color = 'black', zorder = 1, ax = ax)
nodes = pitch.scatter(120 - av_loc_Liv.pass_maker_x, av_loc_Liv.pass_maker_y,
s=350, color = 'red', edgecolors = 'black', linewidth=1, alpha = 1, ax = ax)
for index, row in av_loc_Liv.iterrows():
pitch.annotate(players_Liv[row.name], xy=(120 - row.pass_maker_x, row.pass_maker_y),
c ='black', va = 'center', ha = 'center', size = 10, ax = ax)
plt.title("Pass network for Liverpool against Real Madrid", size = 20)
plt.show()
Liverpool
's pass network visualization, we subtract the x coordinates from 120 just to reverse the x-axis.Now that we have been successful in correctly visualizing the pass networks of the teams involved in the game, we will now start analyzing our networks using metrics from the literature of complex network analysis.
Note that both of our networks are directed weighted graphs, with number of passes as the weight for a directed edge.
Let us first develop the isomorphic graph to the one we just visualized for Real Madrid
, but this time using the networkx
package. First we will use the relevant columns from the pass_Real_new
dataset:
pass_Real_new = pass_Real_new[['pass_maker', 'pass_receiver', 'number_of_passes']]
pass_Real_new
pass_maker | pass_receiver | number_of_passes | |
---|---|---|---|
0 | 14 | 2 | 1 |
1 | 7 | 2 | 3 |
2 | 22 | 2 | 2 |
3 | 9 | 2 | 2 |
4 | 10 | 2 | 10 |
... | ... | ... | ... |
73 | 2 | 1 | 1 |
74 | 9 | 1 | 1 |
75 | 5 | 1 | 2 |
76 | 4 | 1 | 1 |
77 | 8 | 1 | 1 |
78 rows × 3 columns
pass_Real_new
to a list of tuples, where each row is converted to a tuple. This is required for drawing a networkx
graph.L_Real = pass_Real_new.apply(tuple, axis=1).tolist()
print(L_Real)
[('14', '2', 1), ('7', '2', 3), ('22', '2', 2), ('9', '2', 2), ('10', '2', 10), ('12', '2', 2), ('5', '2', 3), ('4', '2', 3), ('8', '2', 1), ('14', '10', 1), ('7', '10', 1), ('2', '10', 7), ('22', '10', 1), ('12', '10', 1), ('5', '10', 5), ('4', '10', 2), ('8', '10', 5), ('14', '12', 1), ('7', '12', 4), ('22', '12', 2), ('1', '12', 2), ('10', '12', 1), ('4', '12', 9), ('8', '12', 4), ('14', '5', 1), ('2', '5', 5), ('1', '5', 2), ('10', '5', 3), ('12', '5', 2), ('4', '5', 5), ('8', '5', 4), ('14', '4', 1), ('7', '4', 1), ('22', '4', 5), ('9', '4', 1), ('1', '4', 4), ('10', '4', 1), ('12', '4', 2), ('5', '4', 6), ('8', '4', 10), ('14', '8', 6), ('2', '8', 1), ('22', '8', 4), ('9', '8', 4), ('1', '8', 1), ('10', '8', 4), ('12', '8', 5), ('5', '8', 4), ('4', '8', 9), ('7', '9', 1), ('2', '9', 1), ('22', '9', 1), ('1', '9', 1), ('10', '9', 1), ('12', '9', 3), ('5', '9', 1), ('8', '9', 2), ('2', '14', 2), ('9', '14', 2), ('10', '14', 1), ('12', '14', 2), ('5', '14', 1), ('8', '14', 2), ('2', '7', 2), ('22', '7', 2), ('9', '7', 1), ('12', '7', 2), ('4', '7', 1), ('8', '7', 2), ('2', '22', 3), ('12', '22', 4), ('4', '22', 4), ('8', '22', 8), ('2', '1', 1), ('9', '1', 1), ('5', '1', 2), ('4', '1', 1), ('8', '1', 1)]
G_Real = nx.DiGraph()
for i in range(len(L_Real)):
G_Real.add_edge(L_Real[i][0], L_Real[i][1], weight = L_Real[i][2])
edges_Real = G_Real.edges()
weights_Real = [G_Real[u][v]['weight'] for u, v in edges_Real]
nx.draw(G_Real, node_size=800, with_labels=True, node_color='white', width = weights_Real)
plt.gca().collections[0].set_edgecolor('black') # sets the edge color of the nodes to black
plt.title("Pass network for Real Madrid vs Liverpool", size = 20)
plt.show()
Liverpool
too, let us first clean the pass_Liv_new
dataset and then draw the isomorphic weighted directed graph:pass_Liv_new = pass_Liv_new[['pass_maker', 'pass_receiver', 'number_of_passes']]
pass_Liv_new
pass_maker | pass_receiver | number_of_passes | |
---|---|---|---|
0 | 5 | 26 | 4 |
1 | 7 | 26 | 1 |
2 | 14 | 26 | 1 |
3 | 1 | 26 | 1 |
4 | 66 | 26 | 1 |
... | ... | ... | ... |
59 | 66 | 6 | 1 |
60 | 4 | 6 | 3 |
61 | 7 | 19 | 2 |
62 | 14 | 19 | 1 |
63 | 11 | 19 | 1 |
64 rows × 3 columns
L_Liv = pass_Liv_new.apply(tuple, axis=1).tolist()
G_Liv = nx.DiGraph()
for i in range(len(L_Liv)):
G_Liv.add_edge(L_Liv[i][0], L_Liv[i][1], weight = L_Liv[i][2])
edges_Liv = G_Liv.edges()
weights_Liv = [G_Liv[u][v]['weight'] for u, v in edges_Liv]
nx.draw(G_Liv, node_size = 800, with_labels = True, node_color = 'red', width = weights_Liv)
plt.gca().collections[0].set_edgecolor('black') # sets the edge color of the nodes to black
plt.show()
Let us discuss some of the important functions from the networkx
package that we have employed for drawing graphs:
DiGraph()
function sets the base class for generating directed graphs,add_edge()
function adds an edge between two nodes given by the first two arguments and the weight
parameter sets the weight for this edgedraw()
function visualizes a networkx
graph and its parameters are self-explanatoryLet us now understand the degree, indegree and outdegree of a node from a directed weighted graph. Indegree of a node is the total number of edges that are directed towards the node, i.e, for our case, the total number of passes received by a player (node). Similarly, outdegree means the total number of edges that are directed outwards from the node, i.e, the total number of passes given by a player. Finally, the degree of a node is the total number of edges connected to a node (ignoring the directions of the edges), i.e, sum of the total number of passes given and the total number of passes received by a player. It is evident that the degree of a node is the sum of its indegree and outdegree.
We will use networkx
to find out the node degrees from the pass network of Real Madrid
.
# Prepare a dictionary with jersey numbers as the node ids,
# i.e, the dictionary keys and degrees as the dictionary values
deg_Real = dict(nx.degree(G_Real))
# convert a dictionary to a pandas dataframe
degree_Real = pd.DataFrame.from_dict(list(deg_Real.items()))
degree_Real.rename(columns = {0:'jersey_number', 1: 'node_degree'}, inplace = True)
degree_Real
jersey_number | node_degree | |
---|---|---|
0 | 14 | 12 |
1 | 2 | 17 |
2 | 7 | 11 |
3 | 22 | 11 |
4 | 9 | 14 |
5 | 10 | 15 |
6 | 12 | 16 |
7 | 5 | 14 |
8 | 4 | 17 |
9 | 8 | 19 |
10 | 1 | 10 |
Real Madrid
in that game, we notice that the player with jersey number 8
(i.e, Toni Kroos
) had the highest degree value of 19. On second are ranked the players with jersey number 2
and 4
with degree value 17, i.e, our favorite Spanish defenders 'Daniel Carvajal Ramos'
and 'Sergio Ramos García'
respectively. Tremendous! Let us use seaborn
to visualize the deg_Real
dictionary via histogram plot:X = list(deg_Real.keys())
Y = list(deg_Real.values())
sns.barplot(x = Y, y = X, palette = "magma")
plt.xticks(range(0, max(Y)+5, 2))
plt.ylabel("Player Jersey number")
plt.xlabel("degree")
plt.title("Player pass degrees for Real Madrid vs Liverpool", size = 16)
plt.show()
Liverpool
too:# Prepare a dictionary with jersey numbers as the node ids,
# i.e, the dictionary keys and degrees as the dictionary values
deg_Liv = dict(nx.degree(G_Liv))
# convert a dictionary to a pandas dataframe
degree_Liv = pd.DataFrame.from_dict(list(deg_Liv.items()))
degree_Liv.rename(columns = {0:'jersey_number', 1: 'node_degree'}, inplace = True)
degree_Liv
jersey_number | node_degree | |
---|---|---|
0 | 5 | 12 |
1 | 26 | 11 |
2 | 7 | 17 |
3 | 14 | 17 |
4 | 1 | 7 |
5 | 66 | 13 |
6 | 4 | 12 |
7 | 11 | 11 |
8 | 6 | 12 |
9 | 9 | 10 |
10 | 19 | 6 |
14
and 7
, i,e 'Jordan Brian Henderson'
and 'James Philip Milner'
respectively. We will visualize the deg_Liv
dictionary via histogram plot:X = list(deg_Liv.keys())
Y = list(deg_Liv.values())
sns.barplot(x = Y, y = X, palette = "magma")
plt.xticks(range(0, max(Y)+5, 2))
plt.ylabel("Player Jersey number")
plt.xlabel("degree")
plt.title("Player pass degrees for Liverpool vs Real Madrid", size = 16)
plt.show()
indeg_Real = dict(G_Real.in_degree())
indegree_Real = pd.DataFrame.from_dict(list(indeg_Real.items()))
indegree_Real.rename(columns = {0:'jersey_number', 1: 'node_indegree'}, inplace = True)
X = list(indeg_Real.keys())
Y = list(indeg_Real.values())
sns.barplot(x = Y, y = X, palette = "hls")
plt.xticks(range(0, max(Y)+5, 2))
plt.ylabel("Player Jersey number")
plt.xlabel("indegree")
plt.title("Player pass indegrees for Real Madrid vs Liverpool", size = 16)
plt.show()
indeg_Liv = dict(G_Liv.in_degree())
indegree_Liv = pd.DataFrame.from_dict(list(indeg_Liv.items()))
indegree_Liv.rename(columns = {0:'jersey_number', 1: 'node_indegree'}, inplace = True)
X = list(indeg_Liv.keys())
Y = list(indeg_Liv.values())
sns.barplot(x = Y, y = X, palette = "hls")
plt.xticks(range(0, max(Y)+5, 2))
plt.ylabel("Player Jersey number")
plt.xlabel("indegree")
plt.title("Player pass indegrees for Liverpool vs Real Madrid", size = 16)
plt.show()
outdeg_Real = dict(G_Real.out_degree())
outdegree_Real = pd.DataFrame.from_dict(list(outdeg_Real.items()))
outdegree_Real.rename(columns = {0:'jersey_number', 1: 'node_outdegree'}, inplace = True)
X = list(outdeg_Real.keys())
Y = list(outdeg_Real.values())
sns.barplot(x = Y, y = X, palette = "hls")
plt.xticks(range(0, max(Y)+5, 2))
plt.ylabel("Player Jersey number")
plt.xlabel("outdegree")
plt.title("Player pass outdegrees for Real Madrid vs Liverpool", size = 16)
plt.show()
outdeg_Liv = dict(G_Liv.out_degree())
outdegree_Liv = pd.DataFrame.from_dict(list(outdeg_Liv.items()))
outdegree_Liv.rename(columns = {0:'jersey_number', 1: 'node_outdegree'}, inplace = True)
X = list(outdeg_Liv.keys())
Y = list(outdeg_Liv.values())
sns.barplot(x = Y, y = X, palette = "hls")
plt.xticks(range(0, max(Y)+5, 2))
plt.ylabel("Player Jersey number")
plt.xlabel("outdegree")
plt.title("Player pass outdegrees for Liverpool vs Real Madrid", size = 16)
plt.show()
G_Real
and G_Liv
graphs:A_Real = nx.adjacency_matrix(G_Real)
A_Liv = nx.adjacency_matrix(G_Liv)
A_Real = A_Real.todense()
A_Liv = A_Liv.todense()
sns.heatmap(A_Real, annot = True, cmap ='gnuplot')
plt.title("Adjacency matrix for Real Madrid's pass network")
plt.show()
sns.heatmap(A_Liv, annot = True, cmap ='gnuplot')
plt.title("Adjacency matrix for Liverpool's pass network")
plt.show()
r_Real = nx.degree_pearson_correlation_coefficient(G_Real, weight = 'weight')
r_Liv = nx.degree_pearson_correlation_coefficient(G_Liv, weight = 'weight')
print(r_Real, r_Liv)
-0.17983836432860179 -0.2412372196699064
'weight'
column in the pass network. Let us create a new graph for Real Madrid
:pass_Real_mod = pass_Real_new[['pass_maker', 'pass_receiver']]
pass_Real_mod['1/nop'] = 1/pass_Real_new['number_of_passes']
pass_Real_mod.head(5)
pass_maker | pass_receiver | 1/nop | |
---|---|---|---|
0 | 14 | 2 | 1.000000 |
1 | 7 | 2 | 0.333333 |
2 | 22 | 2 | 0.500000 |
3 | 9 | 2 | 0.500000 |
4 | 10 | 2 | 0.100000 |
L_Real_mod = pass_Real_mod.apply(tuple, axis=1).tolist()
G_Real_mod = nx.DiGraph()
for i in range(len(L_Real_mod)):
G_Real_mod.add_edge(L_Real_mod[i][0], L_Real_mod[i][1], weight = L_Real_mod[i][2])
edges_Real_mod = G_Real_mod.edges()
weights_Real_mod = [G_Real_mod[u][v]['weight'] for u, v in edges_Real_mod]
nx.draw(G_Real_mod, node_size=800, with_labels=True, node_color='white', width = weights_Real_mod)
plt.gca().collections[0].set_edgecolor('black')
plt.title("Modified pass network for Real Madrid vs Liverpool", size = 20)
plt.show()
Liverpool
too:pass_Liv_mod = pass_Liv_new[['pass_maker', 'pass_receiver']]
pass_Liv_mod['1/nop'] = 1/pass_Liv_new['number_of_passes']
pass_Liv_mod.head(5)
pass_maker | pass_receiver | 1/nop | |
---|---|---|---|
0 | 5 | 26 | 0.25 |
1 | 7 | 26 | 1.00 |
2 | 14 | 26 | 1.00 |
3 | 1 | 26 | 1.00 |
4 | 66 | 26 | 1.00 |
L_Liv_mod = pass_Liv_mod.apply(tuple, axis=1).tolist()
G_Liv_mod = nx.DiGraph()
for i in range(len(L_Liv_mod)):
G_Liv_mod.add_edge(L_Liv_mod[i][0], L_Liv_mod[i][1], weight = L_Liv_mod[i][2])
edges_Liv_mod = G_Liv_mod.edges()
weights_Liv_mod = [G_Liv_mod[u][v]['weight'] for u, v in edges_Liv_mod]
nx.draw(G_Liv_mod, node_size=800, with_labels=True, node_color='red', width = weights_Liv_mod)
plt.gca().collections[0].set_edgecolor('black')
plt.title("Modified pass network for Liverpool vs Real Madrid", size = 20)
plt.show()
Real Madrid
:dis_Real = nx.shortest_path(G_Real_mod, weight = 'weight')
print(dis_Real)
{'14': {'14': ['14'], '2': ['14', '8', '10', '2'], '10': ['14', '8', '10'], '12': ['14', '8', '4', '12'], '5': ['14', '8', '5'], '4': ['14', '8', '4'], '8': ['14', '8'], '9': ['14', '8', '9'], '7': ['14', '8', '7'], '22': ['14', '8', '22'], '1': ['14', '8', '5', '1']}, '2': {'2': ['2'], '10': ['2', '10'], '5': ['2', '5'], '8': ['2', '10', '8'], '9': ['2', '5', '4', '12', '9'], '14': ['2', '14'], '7': ['2', '7'], '22': ['2', '22'], '1': ['2', '5', '1'], '12': ['2', '5', '4', '12'], '4': ['2', '5', '4']}, '7': {'7': ['7'], '2': ['7', '2'], '10': ['7', '2', '10'], '12': ['7', '12'], '4': ['7', '12', '8', '4'], '9': ['7', '12', '9'], '5': ['7', '2', '5'], '8': ['7', '12', '8'], '14': ['7', '12', '14'], '22': ['7', '12', '22'], '1': ['7', '2', '5', '1']}, '22': {'22': ['22'], '2': ['22', '2'], '10': ['22', '8', '10'], '12': ['22', '4', '12'], '4': ['22', '4'], '8': ['22', '8'], '9': ['22', '4', '12', '9'], '7': ['22', '7'], '5': ['22', '4', '5'], '1': ['22', '4', '5', '1'], '14': ['22', '8', '14']}, '9': {'9': ['9'], '2': ['9', '2'], '4': ['9', '8', '4'], '8': ['9', '8'], '14': ['9', '14'], '7': ['9', '8', '7'], '1': ['9', '1'], '10': ['9', '8', '10'], '12': ['9', '8', '4', '12'], '5': ['9', '8', '5'], '22': ['9', '8', '22']}, '10': {'10': ['10'], '2': ['10', '2'], '12': ['10', '8', '4', '12'], '5': ['10', '2', '5'], '4': ['10', '8', '4'], '8': ['10', '8'], '9': ['10', '8', '9'], '14': ['10', '2', '14'], '7': ['10', '2', '7'], '22': ['10', '8', '22'], '1': ['10', '2', '5', '1']}, '12': {'12': ['12'], '2': ['12', '2'], '10': ['12', '8', '10'], '5': ['12', '8', '5'], '4': ['12', '8', '4'], '8': ['12', '8'], '9': ['12', '9'], '14': ['12', '14'], '7': ['12', '7'], '22': ['12', '22'], '1': ['12', '8', '5', '1']}, '5': {'5': ['5'], '2': ['5', '10', '2'], '10': ['5', '10'], '4': ['5', '4'], '8': ['5', '8'], '9': ['5', '4', '12', '9'], '14': ['5', '8', '14'], '1': ['5', '1'], '12': ['5', '4', '12'], '7': ['5', '8', '7'], '22': ['5', '8', '22']}, '4': {'4': ['4'], '2': ['4', '2'], '10': ['4', '8', '10'], '12': ['4', '12'], '5': ['4', '5'], '8': ['4', '8'], '7': ['4', '12', '7'], '22': ['4', '8', '22'], '1': ['4', '5', '1'], '9': ['4', '12', '9'], '14': ['4', '12', '14']}, '8': {'8': ['8'], '2': ['8', '10', '2'], '10': ['8', '10'], '12': ['8', '4', '12'], '5': ['8', '5'], '4': ['8', '4'], '9': ['8', '9'], '14': ['8', '14'], '7': ['8', '7'], '22': ['8', '22'], '1': ['8', '5', '1']}, '1': {'1': ['1'], '12': ['1', '4', '12'], '5': ['1', '4', '5'], '4': ['1', '4'], '8': ['1', '4', '8'], '9': ['1', '4', '12', '9'], '2': ['1', '4', '2'], '10': ['1', '4', '8', '10'], '7': ['1', '4', '12', '7'], '22': ['1', '4', '8', '22'], '14': ['1', '4', '12', '14']}}
'Keylor Navas Gamboa'
(jersey number 1
) to 'Cristiano Ronaldo dos Santos Aveiro'
(jersey number 7
). We will type the following:print(dis_Real['1']['7'])
['1', '4', '12', '7']
'Keylor Navas Gamboa'
(jersey: 1
), to 'Cristiano Ronaldo dos Santos Aveiro'
(jersey: 7
) was to pass the ball first to 'Sergio Ramos García'
(jersey: 4
) who would pass to 'Marcelo Vieira da Silva Júnior'
(jersey: 12
) with him ultimately passing to 'Cristiano Ronaldo dos Santos Aveiro'
. This seems like a good post-match analysis tool. I got this idea after discussing with Sarath Babu. Liverpool
:dis_Liv = nx.shortest_path(G_Liv_mod, weight = 'weight')
print(dis_Liv)
{'5': {'5': ['5'], '26': ['5', '26'], '7': ['5', '26', '7'], '14': ['5', '14'], '4': ['5', '4'], '11': ['5', '11'], '66': ['5', '26', '7', '66'], '9': ['5', '26', '9'], '1': ['5', '14', '1'], '6': ['5', '14', '6'], '19': ['5', '26', '7', '19']}, '26': {'26': ['26'], '5': ['26', '5'], '7': ['26', '7'], '14': ['26', '14'], '9': ['26', '9'], '4': ['26', '4'], '11': ['26', '9', '11'], '66': ['26', '7', '66'], '1': ['26', '14', '1'], '6': ['26', '14', '6'], '19': ['26', '7', '19']}, '7': {'7': ['7'], '26': ['7', '66', '5', '26'], '5': ['7', '66', '5'], '14': ['7', '14'], '9': ['7', '66', '9'], '4': ['7', '4'], '1': ['7', '1'], '11': ['7', '66', '11'], '66': ['7', '66'], '6': ['7', '14', '6'], '19': ['7', '19']}, '14': {'14': ['14'], '26': ['14', '5', '26'], '5': ['14', '5'], '7': ['14', '7'], '4': ['14', '4'], '1': ['14', '1'], '66': ['14', '7', '66'], '6': ['14', '6'], '19': ['14', '7', '19'], '11': ['14', '7', '66', '11'], '9': ['14', '5', '26', '9']}, '1': {'1': ['1'], '26': ['1', '26'], '14': ['1', '14'], '4': ['1', '6', '4'], '6': ['1', '6'], '7': ['1', '6', '7'], '11': ['1', '6', '66', '11'], '66': ['1', '6', '66'], '5': ['1', '6', '66', '5'], '9': ['1', '6', '66', '9'], '19': ['1', '6', '7', '19']}, '66': {'66': ['66'], '26': ['66', '5', '26'], '5': ['66', '5'], '14': ['66', '14'], '9': ['66', '9'], '11': ['66', '11'], '6': ['66', '14', '6'], '7': ['66', '14', '7'], '4': ['66', '5', '4'], '19': ['66', '11', '19'], '1': ['66', '14', '1']}, '4': {'4': ['4'], '26': ['4', '26'], '5': ['4', '26', '5'], '14': ['4', '26', '14'], '66': ['4', '6', '66'], '6': ['4', '6'], '7': ['4', '26', '7'], '9': ['4', '26', '9'], '1': ['4', '6', '1'], '11': ['4', '6', '66', '11'], '19': ['4', '26', '7', '19']}, '11': {'11': ['11'], '5': ['11', '66', '5'], '7': ['11', '9', '7'], '9': ['11', '9'], '4': ['11', '4'], '66': ['11', '66'], '19': ['11', '19'], '14': ['11', '9', '14'], '6': ['11', '9', '14', '6'], '26': ['11', '66', '5', '26'], '1': ['11', '9', '14', '1']}, '6': {'6': ['6'], '7': ['6', '7'], '14': ['6', '66', '14'], '4': ['6', '4'], '1': ['6', '1'], '11': ['6', '66', '11'], '66': ['6', '66'], '26': ['6', '4', '26'], '5': ['6', '66', '5'], '9': ['6', '66', '9'], '19': ['6', '7', '19']}, '9': {'9': ['9'], '7': ['9', '7'], '14': ['9', '14'], '11': ['9', '11'], '66': ['9', '11', '66'], '6': ['9', '14', '6'], '5': ['9', '14', '5'], '4': ['9', '14', '4'], '19': ['9', '7', '19'], '26': ['9', '14', '5', '26'], '1': ['9', '14', '1']}, '19': {'19': ['19'], '7': ['19', '7'], '14': ['19', '14'], '9': ['19', '9'], '11': ['19', '9', '11'], '66': ['19', '9', '11', '66'], '6': ['19', '14', '6'], '5': ['19', '14', '5'], '4': ['19', '14', '4'], '26': ['19', '14', '5', '26'], '1': ['19', '14', '1']}}
print(dis_Liv['1']['9'])
['1', '6', '66', '9']
p
tells us how far the furthest player node from p
is positioned in the pass network. Let us calculate the eccentricities for all the 11 nodes for Real Madrid
.E_Real = nx.eccentricity(G_Real_mod)
print(E_Real)
{'14': 2, '2': 2, '7': 2, '22': 2, '9': 2, '10': 2, '12': 2, '5': 2, '4': 2, '8': 1, '1': 2}
av_E_Real = sum(list(E_Real.values()))/len(E_Real)
print(av_E_Real)
1.9090909090909092
Liverpool
:E_Liv = nx.eccentricity(G_Liv_mod)
print(E_Liv)
{'5': 2, '26': 2, '7': 1, '14': 2, '1': 2, '66': 2, '4': 2, '11': 2, '6': 2, '9': 2, '19': 2}
av_E_Liv = sum(list(E_Liv.values()))/len(E_Liv)
print(av_E_Liv)
1.9090909090909092
G_Real
(note that this graph should not be the modified version)cc_Real = nx.average_clustering(G_Real, weight = 'weight')
print(cc_Real)
0.182334851979709
Liverpool
:cc_Liv = nx.average_clustering(G_Liv, weight = 'weight')
print(cc_Liv)
0.27664278424505534
Real Madrid
's pass network stating the fact that a lesser number of players passed the ball among each other, compared to that of Liverpool
.centrality
(especially the betweenness centrality
) for each node in either team's pass network and understand which player was the most important in their pass network. For Real Madrid
:bc_Real = nx.betweenness_centrality(G_Real, weight = 'weight')
print(bc_Real)
{'14': 0.15222222222222223, '2': 0.10685185185185186, '7': 0.05592592592592593, '22': 0.0, '9': 0.14462962962962964, '10': 0.12407407407407407, '12': 0.009259259259259259, '5': 0.007407407407407408, '4': 0.06851851851851852, '8': 0.031481481481481485, '1': 0.11703703703703704}
max_bc_Real = max(bc_Real, key = bc_Real.get)
print(max_bc_Real)
14
Liverpool
:bc_Liv = nx.betweenness_centrality(G_Liv, weight = 'weight')
print(bc_Liv)
max_bc_Liv = max(bc_Liv, key = bc_Liv.get)
print(max_bc_Liv)
{'5': 0.06296296296296296, '26': 0.016666666666666666, '7': 0.2453703703703704, '14': 0.12407407407407407, '1': 0.002777777777777778, '66': 0.075, '4': 0.07222222222222222, '11': 0.05555555555555556, '6': 0.1259259259259259, '9': 0.021296296296296296, '19': 0.03888888888888889} 7
'Carlos Henrique Casimiro'
(jersey: 4
) from Real Madrid
and 'James Philip Milner'
(jersey: 7) from Liverpool
. We have been able to compute some interesting results using complex network analysis on our pass networks.events
dataset:events.head(12)
50_50 | ball_receipt_outcome | ball_recovery_recovery_failure | block_offensive | carry_end_location | clearance_aerial_won | clearance_body_part | clearance_head | clearance_left_foot | clearance_right_foot | ... | shot_statsbomb_xg | shot_technique | shot_type | substitution_outcome | substitution_replacement | tactics | team | timestamp | type | under_pressure | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | {'formation': 41212, 'lineup': [{'player': {'i... | Real Madrid | 00:00:00.000 | Starting XI | NaN |
1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | {'formation': 433, 'lineup': [{'player': {'id'... | Liverpool | 00:00:00.000 | Starting XI | NaN |
2 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Real Madrid | 00:00:00.000 | Half Start | NaN |
3 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Liverpool | 00:00:00.000 | Half Start | NaN |
4 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Liverpool | 00:00:00.000 | Half Start | NaN |
5 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Real Madrid | 00:00:00.000 | Half Start | NaN |
6 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Liverpool | 00:00:00.371 | Pass | NaN |
7 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Liverpool | 00:00:03.275 | Pass | NaN |
8 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Real Madrid | 00:00:08.236 | Pass | NaN |
9 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Real Madrid | 00:00:10.701 | Pass | True |
10 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Real Madrid | 00:00:11.728 | Pass | NaN |
11 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Real Madrid | 00:00:15.994 | Pass | NaN |
12 rows × 86 columns
events
dataset: 'team'
, 'type'
, 'minute'
, 'location'
, 'pass_end_location'
, 'pass_outcome'
, and 'player'
events_pass = events[['team', 'type', 'minute', 'location',
'pass_end_location', 'pass_outcome', 'player']]
events_pass.head(10)
team | type | minute | location | pass_end_location | pass_outcome | player | |
---|---|---|---|---|---|---|---|
0 | Real Madrid | Starting XI | 0 | NaN | NaN | NaN | NaN |
1 | Liverpool | Starting XI | 0 | NaN | NaN | NaN | NaN |
2 | Real Madrid | Half Start | 0 | NaN | NaN | NaN | NaN |
3 | Liverpool | Half Start | 0 | NaN | NaN | NaN | NaN |
4 | Liverpool | Half Start | 45 | NaN | NaN | NaN | NaN |
5 | Real Madrid | Half Start | 45 | NaN | NaN | NaN | NaN |
6 | Liverpool | Pass | 0 | [60.0, 40.0] | [32.1, 41.2] | NaN | James Philip Milner |
7 | Liverpool | Pass | 0 | [35.0, 40.8] | [92.7, 22.7] | Incomplete | Dejan Lovren |
8 | Real Madrid | Pass | 0 | [27.4, 60.2] | [36.1, 71.6] | NaN | Raphaël Varane |
9 | Real Madrid | Pass | 0 | [35.3, 75.4] | [22.4, 76.6] | NaN | Luka Modrić |
player
column gives us the names of the players who were associated with different events during the match. Suppose, we are only interested to generate the pass map and its corresponding heat map for a particular player, for example, 'Toni Kroos'
. For that, we have to clean the events_pass
dataset in such a way that, we have only those rows where player='Toni Kroos'
. Be very careful to use the exact spelling while performing these string operations, otherwise the reader will end up with unnecessary syntax and/or logical errors. Before filtering, let us collect the name of all the players who were involved in this match.players = events_pass.player.unique()
print(players)
[nan 'James Philip Milner' 'Dejan Lovren' 'Raphaël Varane' 'Luka Modrić' 'Daniel Carvajal Ramos' 'Carlos Henrique Casimiro' 'Jordan Brian Henderson' 'Sadio Mané' 'Roberto Firmino Barbosa de Oliveira' 'Mohamed Salah' 'Sergio Ramos García' 'Marcelo Vieira da Silva Júnior' 'Toni Kroos' 'Cristiano Ronaldo dos Santos Aveiro' 'Karim Benzema' 'Trent Alexander-Arnold' 'Keylor Navas Gamboa' 'Francisco Román Alarcón Suárez' 'Virgil van Dijk' 'Andrew Robertson' 'Georginio Wijnaldum' 'Loris Karius' 'Adam David Lallana' 'José Ignacio Fernández Iglesias' 'Gareth Frank Bale' 'Emre Can' 'Marco Asensio Willemsen']
'Toni Kroos'
in our case). One good practice is to simply copy the particular player name from the players
list that we just generated and use it according to our needs. This way, the spelling errors can be avoided. The filtration with python is an easy process:events_pass_p1 = events_pass[events_pass['player'] == 'Toni Kroos']
events_pass_p1.head(10)
team | type | minute | location | pass_end_location | pass_outcome | player | |
---|---|---|---|---|---|---|---|
19 | Real Madrid | Pass | 0 | [48.8, 13.9] | [36.1, 56.3] | NaN | Toni Kroos |
28 | Real Madrid | Pass | 1 | [23.4, 18.6] | [14.9, 26.8] | NaN | Toni Kroos |
79 | Real Madrid | Pass | 5 | [35.0, 24.9] | [57.1, 6.6] | NaN | Toni Kroos |
85 | Real Madrid | Pass | 6 | [41.7, 21.7] | [43.2, 41.2] | NaN | Toni Kroos |
89 | Real Madrid | Pass | 6 | [50.6, 28.3] | [49.2, 5.5] | NaN | Toni Kroos |
106 | Real Madrid | Pass | 7 | [42.2, 11.1] | [50.6, 13.4] | NaN | Toni Kroos |
125 | Real Madrid | Pass | 9 | [48.7, 53.1] | [50.1, 63.3] | NaN | Toni Kroos |
126 | Real Madrid | Pass | 9 | [56.7, 59.6] | [48.8, 30.9] | NaN | Toni Kroos |
128 | Real Madrid | Pass | 9 | [56.4, 15.2] | [48.8, 28.2] | NaN | Toni Kroos |
138 | Real Madrid | Pass | 10 | [42.9, 9.4] | [28.8, 39.2] | NaN | Toni Kroos |
type
column in events_pass_p1
has event types other than passes, which we do not want for now. Thus, we have to again clean the dataset such that we have only those rows where type = Pass
. The other rows can be discarded for now. Before that, let us analyse what event types other than 'Pass' are available for 'Toni Kroos'
:print(events_pass_p1.type.unique())
['Pass' 'Ball Receipt*' 'Carry' 'Ball Recovery' 'Pressure' 'Foul Won' 'Foul Committed' 'Dispossessed' 'Duel' 'Dribbled Past' 'Block']
0
:events_pass_p1 = events_pass_p1[events_pass_p1['type'] == 'Pass'].reset_index()
events_pass_p1
index | team | type | minute | location | pass_end_location | pass_outcome | player | |
---|---|---|---|---|---|---|---|---|
0 | 19 | Real Madrid | Pass | 0 | [48.8, 13.9] | [36.1, 56.3] | NaN | Toni Kroos |
1 | 28 | Real Madrid | Pass | 1 | [23.4, 18.6] | [14.9, 26.8] | NaN | Toni Kroos |
2 | 79 | Real Madrid | Pass | 5 | [35.0, 24.9] | [57.1, 6.6] | NaN | Toni Kroos |
3 | 85 | Real Madrid | Pass | 6 | [41.7, 21.7] | [43.2, 41.2] | NaN | Toni Kroos |
4 | 89 | Real Madrid | Pass | 6 | [50.6, 28.3] | [49.2, 5.5] | NaN | Toni Kroos |
... | ... | ... | ... | ... | ... | ... | ... | ... |
87 | 975 | Real Madrid | Pass | 85 | [120.0, 80.0] | [116.1, 76.6] | NaN | Toni Kroos |
88 | 976 | Real Madrid | Pass | 85 | [120.0, 80.0] | [115.9, 77.3] | NaN | Toni Kroos |
89 | 978 | Real Madrid | Pass | 85 | [96.8, 73.1] | [75.4, 74.5] | NaN | Toni Kroos |
90 | 1026 | Real Madrid | Pass | 91 | [120.0, 0.1] | [91.5, 8.1] | NaN | Toni Kroos |
91 | 1038 | Real Madrid | Pass | 92 | [56.9, 41.5] | [84.8, 71.3] | NaN | Toni Kroos |
92 rows × 8 columns
'Toni Kroos'
from the match.'Toni Kroos'
has been involved in 92
passes. We will later work out his pass success rate. But look at the number. Isn't he a brilliant midfielder that the German national team and the Real Madrid team have in their disposal? What a playmaker he is! Let us find out what were all his pass outcomes:print(events_pass_p1.pass_outcome.unique())
[nan 'Out' 'Incomplete' 'Pass Offside']
nan
, 'Toni Kross'
has Out
, Incomplete
and Pass Offside
as pass outcomes. If we look closely the events_pass_p1
dataframe has the minute
column which tells us at what minute the pass had started from Kroos
's end. It also has the location
and the pass_end_location
columns informing us about the coordinates of Kroos
when he pass the ball and the coordinates of where the ball ended after the pass (successful or unsuccessful). Let us manipulate the pass_outcome
column by replacing all the nan
values with 'successful'
with the help of fillna()
function provided by pandas
. This will teach us the simplest way to handle nan
values.events_pass_p1['pass_outcome'] = events_pass_p1['pass_outcome'].fillna('Successful')
events_pass_p1
index | team | type | minute | location | pass_end_location | pass_outcome | player | |
---|---|---|---|---|---|---|---|---|
0 | 19 | Real Madrid | Pass | 0 | [48.8, 13.9] | [36.1, 56.3] | Successful | Toni Kroos |
1 | 28 | Real Madrid | Pass | 1 | [23.4, 18.6] | [14.9, 26.8] | Successful | Toni Kroos |
2 | 79 | Real Madrid | Pass | 5 | [35.0, 24.9] | [57.1, 6.6] | Successful | Toni Kroos |
3 | 85 | Real Madrid | Pass | 6 | [41.7, 21.7] | [43.2, 41.2] | Successful | Toni Kroos |
4 | 89 | Real Madrid | Pass | 6 | [50.6, 28.3] | [49.2, 5.5] | Successful | Toni Kroos |
... | ... | ... | ... | ... | ... | ... | ... | ... |
87 | 975 | Real Madrid | Pass | 85 | [120.0, 80.0] | [116.1, 76.6] | Successful | Toni Kroos |
88 | 976 | Real Madrid | Pass | 85 | [120.0, 80.0] | [115.9, 77.3] | Successful | Toni Kroos |
89 | 978 | Real Madrid | Pass | 85 | [96.8, 73.1] | [75.4, 74.5] | Successful | Toni Kroos |
90 | 1026 | Real Madrid | Pass | 91 | [120.0, 0.1] | [91.5, 8.1] | Successful | Toni Kroos |
91 | 1038 | Real Madrid | Pass | 92 | [56.9, 41.5] | [84.8, 71.3] | Successful | Toni Kroos |
92 rows × 8 columns
Loc = events_pass_p1['location']
Loc = pd.DataFrame(Loc.to_list(), columns=['location_x', 'location_y'])
Loc_end = events_pass_p1['pass_end_location']
Loc_end = pd.DataFrame(Loc_end.to_list(), columns=['pass_end_location_x', 'pass_end_location_y'])
events_pass_p1['location_x'] = Loc['location_x']
events_pass_p1['location_y'] = Loc['location_y']
events_pass_p1['pass_end_location_x'] = Loc_end['pass_end_location_x']
events_pass_p1['pass_end_location_y'] = Loc_end['pass_end_location_y']
events_pass_p1 = events_pass_p1[['minute', 'location_x', 'location_y',
'pass_end_location_x', 'pass_end_location_y', 'pass_outcome']]
events_pass_p1.head(8)
minute | location_x | location_y | pass_end_location_x | pass_end_location_y | pass_outcome | |
---|---|---|---|---|---|---|
0 | 0 | 48.8 | 13.9 | 36.1 | 56.3 | Successful |
1 | 1 | 23.4 | 18.6 | 14.9 | 26.8 | Successful |
2 | 5 | 35.0 | 24.9 | 57.1 | 6.6 | Successful |
3 | 6 | 41.7 | 21.7 | 43.2 | 41.2 | Successful |
4 | 6 | 50.6 | 28.3 | 49.2 | 5.5 | Successful |
5 | 7 | 42.2 | 11.1 | 50.6 | 13.4 | Successful |
6 | 9 | 48.7 | 53.1 | 50.1 | 63.3 | Successful |
7 | 9 | 56.7 | 59.6 | 48.8 | 30.9 | Successful |
Toni Kroos
on a football pitch and also visualize its corresponding heat map.pitch = Pitch(pitch_color = 'black', line_color = 'white', constrained_layout = True, tight_layout = False, goal_type = 'box')
fig, ax = pitch.draw()
# Heat map code
res = sns.kdeplot(events_pass_p1['location_x'], events_pass_p1['location_y'], fill = True,
thresh = 0.05, alpha = 0.5, levels = 10, cmap = 'Purples_d')
# Pass map code
for i in range(len(events_pass_p1)):
if events_pass_p1.pass_outcome[i] == 'Successful':
pitch.arrows(events_pass_p1.location_x[i], events_pass_p1.location_y[i], events_pass_p1.pass_end_location_x[i],
events_pass_p1.pass_end_location_y[i], ax=ax, color='green', width = 3)
pitch.scatter(events_pass_p1.location_x[i], events_pass_p1.location_y[i], ax = ax, color = 'green')
else:
pitch.arrows(events_pass_p1.location_x[i], events_pass_p1.location_y[i], events_pass_p1.pass_end_location_x[i],
events_pass_p1.pass_end_location_y[i], ax=ax, color='red', width=3)
pitch.scatter(events_pass_p1.location_x[i], events_pass_p1.location_y[i], ax = ax, color='red')
plt.title("Toni Kroos pass and heat map")
Text(0.5, 1.0, 'Toni Kroos pass and heat map')
kdeplot()
function, the thresh
value sets the lowest iso-proportion level at which the contour lines are to be drawn, levels
sets the number of contour levels, fill
sets whether to fill the area between the contours, the alpha
sets the transparency of the plot (default value is 1
, lesser than 1
means more transparent), and the cmap
sets the color map. To study more about kdeplot()
look here.'Toni Kroos'
let us calculate the percentage of successful and unsuccessful passes.events_pass_p1['pass_outcome'].value_counts(normalize=True).mul(100)
Successful 91.304348 Incomplete 6.521739 Out 1.086957 Pass Offside 1.086957 Name: pass_outcome, dtype: float64
events_pass_p1['pass_outcome'].value_counts(normalize=True).mul(100).plot.bar()
<AxesSubplot:>
'Kroos'
had created around 91.3% of successful passes. Wild!X
then the convex hull is the smallest convex set that contains X
. This will help us get an idea about the optimal field coverage of a player during the match.scipy
package which provides us with a collection of modules for working on scientific computation with Python.scipy.spatial
module that allows us to work with spatial algorithms and data structures. As we are going to work with convex hulls first, let us import the ConvexHull
classes from scipy.spatial
: from scipy.spatial import ConvexHull
events
dataset:events_hull = events[['team', 'location', 'type', 'player']]
events_hull.head(10)
team | location | type | player | |
---|---|---|---|---|
0 | Real Madrid | NaN | Starting XI | NaN |
1 | Liverpool | NaN | Starting XI | NaN |
2 | Real Madrid | NaN | Half Start | NaN |
3 | Liverpool | NaN | Half Start | NaN |
4 | Liverpool | NaN | Half Start | NaN |
5 | Real Madrid | NaN | Half Start | NaN |
6 | Liverpool | [60.0, 40.0] | Pass | James Philip Milner |
7 | Liverpool | [35.0, 40.8] | Pass | Dejan Lovren |
8 | Real Madrid | [27.4, 60.2] | Pass | Raphaël Varane |
9 | Real Madrid | [35.3, 75.4] | Pass | Luka Modrić |
type
to Pass
or Shot
.events_hull = events_hull[(events_hull['type'] == 'Pass') | (events_hull['type'] == 'Shot')].reset_index()
events_hull.head(10)
index | team | location | type | player | |
---|---|---|---|---|---|
0 | 6 | Liverpool | [60.0, 40.0] | Pass | James Philip Milner |
1 | 7 | Liverpool | [35.0, 40.8] | Pass | Dejan Lovren |
2 | 8 | Real Madrid | [27.4, 60.2] | Pass | Raphaël Varane |
3 | 9 | Real Madrid | [35.3, 75.4] | Pass | Luka Modrić |
4 | 10 | Real Madrid | [22.3, 76.6] | Pass | Daniel Carvajal Ramos |
5 | 11 | Real Madrid | [36.2, 75.3] | Pass | Carlos Henrique Casimiro |
6 | 12 | Liverpool | [76.5, 18.1] | Pass | Jordan Brian Henderson |
7 | 13 | Liverpool | [84.4, 10.0] | Pass | Sadio Mané |
8 | 14 | Liverpool | [91.6, 21.3] | Pass | Roberto Firmino Barbosa de Oliveira |
9 | 15 | Liverpool | [92.2, 50.9] | Pass | Mohamed Salah |
location
column into location_x
and location_y
columns:Loc = events_hull['location']
Loc = pd.DataFrame(Loc.to_list(), columns=['location_x', 'location_y'])
events_hull['location_x'] = Loc['location_x']
events_hull['location_y'] = Loc['location_y']
events_hull.head(10)
index | team | location | type | player | location_x | location_y | |
---|---|---|---|---|---|---|---|
0 | 6 | Liverpool | [60.0, 40.0] | Pass | James Philip Milner | 60.0 | 40.0 |
1 | 7 | Liverpool | [35.0, 40.8] | Pass | Dejan Lovren | 35.0 | 40.8 |
2 | 8 | Real Madrid | [27.4, 60.2] | Pass | Raphaël Varane | 27.4 | 60.2 |
3 | 9 | Real Madrid | [35.3, 75.4] | Pass | Luka Modrić | 35.3 | 75.4 |
4 | 10 | Real Madrid | [22.3, 76.6] | Pass | Daniel Carvajal Ramos | 22.3 | 76.6 |
5 | 11 | Real Madrid | [36.2, 75.3] | Pass | Carlos Henrique Casimiro | 36.2 | 75.3 |
6 | 12 | Liverpool | [76.5, 18.1] | Pass | Jordan Brian Henderson | 76.5 | 18.1 |
7 | 13 | Liverpool | [84.4, 10.0] | Pass | Sadio Mané | 84.4 | 10.0 |
8 | 14 | Liverpool | [91.6, 21.3] | Pass | Roberto Firmino Barbosa de Oliveira | 91.6 | 21.3 |
9 | 15 | Liverpool | [92.2, 50.9] | Pass | Mohamed Salah | 92.2 | 50.9 |
location
column:events_hull = events_hull[['team', 'type', 'player', 'location_x', 'location_y']]
events_hull.head(10)
team | type | player | location_x | location_y | |
---|---|---|---|---|---|
0 | Liverpool | Pass | James Philip Milner | 60.0 | 40.0 |
1 | Liverpool | Pass | Dejan Lovren | 35.0 | 40.8 |
2 | Real Madrid | Pass | Raphaël Varane | 27.4 | 60.2 |
3 | Real Madrid | Pass | Luka Modrić | 35.3 | 75.4 |
4 | Real Madrid | Pass | Daniel Carvajal Ramos | 22.3 | 76.6 |
5 | Real Madrid | Pass | Carlos Henrique Casimiro | 36.2 | 75.3 |
6 | Liverpool | Pass | Jordan Brian Henderson | 76.5 | 18.1 |
7 | Liverpool | Pass | Sadio Mané | 84.4 | 10.0 |
8 | Liverpool | Pass | Roberto Firmino Barbosa de Oliveira | 91.6 | 21.3 |
9 | Liverpool | Pass | Mohamed Salah | 92.2 | 50.9 |
Real Madrid
and the other for Liverpool
:events_hull_Real = events_hull[events_hull['team'] == 'Real Madrid'].reset_index()
events_hull_Liv = events_hull[events_hull['team'] == 'Liverpool'].reset_index()
events_hull_Real.head(5)
index | team | type | player | location_x | location_y | |
---|---|---|---|---|---|---|
0 | 2 | Real Madrid | Pass | Raphaël Varane | 27.4 | 60.2 |
1 | 3 | Real Madrid | Pass | Luka Modrić | 35.3 | 75.4 |
2 | 4 | Real Madrid | Pass | Daniel Carvajal Ramos | 22.3 | 76.6 |
3 | 5 | Real Madrid | Pass | Carlos Henrique Casimiro | 36.2 | 75.3 |
4 | 10 | Real Madrid | Pass | Sergio Ramos García | 14.7 | 23.2 |
events_hull_Liv.head(5)
index | team | type | player | location_x | location_y | |
---|---|---|---|---|---|---|
0 | 0 | Liverpool | Pass | James Philip Milner | 60.0 | 40.0 |
1 | 1 | Liverpool | Pass | Dejan Lovren | 35.0 | 40.8 |
2 | 6 | Liverpool | Pass | Jordan Brian Henderson | 76.5 | 18.1 |
3 | 7 | Liverpool | Pass | Sadio Mané | 84.4 | 10.0 |
4 | 8 | Liverpool | Pass | Roberto Firmino Barbosa de Oliveira | 91.6 | 21.3 |
players_Real = events_hull_Real.player.unique()
players_Liv = events_hull_Liv.player.unique()
print(players_Real)
print(players_Liv)
['Raphaël Varane' 'Luka Modrić' 'Daniel Carvajal Ramos' 'Carlos Henrique Casimiro' 'Sergio Ramos García' 'Marcelo Vieira da Silva Júnior' 'Toni Kroos' 'Cristiano Ronaldo dos Santos Aveiro' 'Karim Benzema' 'Keylor Navas Gamboa' 'Francisco Román Alarcón Suárez' 'José Ignacio Fernández Iglesias' 'Gareth Frank Bale' 'Marco Asensio Willemsen'] ['James Philip Milner' 'Dejan Lovren' 'Jordan Brian Henderson' 'Sadio Mané' 'Roberto Firmino Barbosa de Oliveira' 'Mohamed Salah' 'Trent Alexander-Arnold' 'Virgil van Dijk' 'Andrew Robertson' 'Georginio Wijnaldum' 'Loris Karius' 'Adam David Lallana' 'Emre Can']
events_hull_Real
.events_hull_Toni = events_hull_Real[events_hull_Real['player'] == 'Toni Kroos']
events_hull_Toni
index | team | type | player | location_x | location_y | |
---|---|---|---|---|---|---|
7 | 13 | Real Madrid | Pass | Toni Kroos | 48.8 | 13.9 |
15 | 22 | Real Madrid | Pass | Toni Kroos | 23.4 | 18.6 |
30 | 73 | Real Madrid | Pass | Toni Kroos | 35.0 | 24.9 |
36 | 79 | Real Madrid | Pass | Toni Kroos | 41.7 | 21.7 |
40 | 83 | Real Madrid | Pass | Toni Kroos | 50.6 | 28.3 |
... | ... | ... | ... | ... | ... | ... |
638 | 969 | Real Madrid | Pass | Toni Kroos | 120.0 | 80.0 |
639 | 970 | Real Madrid | Pass | Toni Kroos | 120.0 | 80.0 |
641 | 972 | Real Madrid | Pass | Toni Kroos | 96.8 | 73.1 |
666 | 1020 | Real Madrid | Pass | Toni Kroos | 120.0 | 0.1 |
672 | 1032 | Real Madrid | Pass | Toni Kroos | 56.9 | 41.5 |
92 rows × 6 columns
location_x
and location_y
from events_hull_Toni
and then compute the upper and lower bounds of the data. Any points lying beyond these bounds, i.e any point lying above the lower bound and any point lying below the upper bound, are decided to be outliers and are discarded. We use box plots and whisker plots to visualize the interquartile range for the datapoints: e_box = pd.DataFrame(data = events_hull_Toni, columns = ["location_x", "location_y"])
boxplot = sns.boxplot(x = "variable", y ="value", data=pd.melt(e_box),
order = ["location_x", "location_y"])
boxplot = sns.stripplot(x = "variable", y = "value", data = pd.melt(e_box), marker="o",
color="red", order = ["location_x", "location_y"])
boxplot.axes.set_title("Boxplot for Toni Kroos's location conditions")
plt.show()
Q1 = np.percentile(events_hull_Toni['location_x'], 25, interpolation='midpoint')
Q3 = np.percentile(events_hull_Toni['location_x'], 75, interpolation='midpoint')
IQR_x = Q3 - Q1
minimum_x = Q1 - 1.5*IQR_x
maximum_x = Q3 + 1.5*IQR_x
Q1, Q3, IQR_x, minimum_x, maximum_x
(47.400000000000006, 67.85, 20.44999999999999, 16.725000000000023, 98.52499999999998)
Q1 = np.percentile(events_hull_Toni['location_y'], 25, interpolation='midpoint')
Q3 = np.percentile(events_hull_Toni['location_y'], 75, interpolation='midpoint')
IQR_y = Q3 - Q1
minimum_y = Q1 - 1.5*IQR_y
maximum_y = Q3 + 1.5*IQR_y
Q1, Q3, IQR_y, minimum_y, maximum_y
(15.0, 41.8, 26.799999999999997, -25.199999999999996, 82.0)
upper = np.where((events_hull_Toni['location_x'] >= maximum_x) & (events_hull_Toni['location_y'] >= maximum_y))
lower = np.where((events_hull_Toni['location_x'] <= minimum_x) & (events_hull_Toni['location_y'] <= minimum_y))
events_hull_Toni.drop(upper[0], inplace = True)
events_hull_Toni.drop(lower[0], inplace = True)
events_hull_Toni
dataset:events_hull_Toni = events_hull_Toni.reset_index()
events_hull_Toni = events_hull_Toni[['team', 'type', 'player', 'location_x', 'location_y']]
events_hull_Toni.head(10)
team | type | player | location_x | location_y | |
---|---|---|---|---|---|
0 | Real Madrid | Pass | Toni Kroos | 48.8 | 13.9 |
1 | Real Madrid | Pass | Toni Kroos | 23.4 | 18.6 |
2 | Real Madrid | Pass | Toni Kroos | 35.0 | 24.9 |
3 | Real Madrid | Pass | Toni Kroos | 41.7 | 21.7 |
4 | Real Madrid | Pass | Toni Kroos | 50.6 | 28.3 |
5 | Real Madrid | Pass | Toni Kroos | 42.2 | 11.1 |
6 | Real Madrid | Pass | Toni Kroos | 48.7 | 53.1 |
7 | Real Madrid | Pass | Toni Kroos | 56.7 | 59.6 |
8 | Real Madrid | Pass | Toni Kroos | 56.4 | 15.2 |
9 | Real Madrid | Pass | Toni Kroos | 42.9 | 9.4 |
points_hull = events_hull_Toni[['location_x', 'location_y']].values
ConvexHull()
function from scipy.spatial
:convex_hull_Toni = ConvexHull(events_hull_Toni[['location_x', 'location_y']])
vertices
attribute consists of the indices of the points in points_hull
that make up the convex hull, and the simplices
attribute too consists of the indices of the points in points_hull
. The simplices
are a list of 1-D simplices of a particular length, representing line segments in 2-D. Let us print the indices:print(convex_hull_Toni.vertices)
[50 41 55 75 84 1 67 51]
print(convex_hull_Toni.simplices)
[[50 41] [67 1] [84 1] [84 75] [55 41] [55 75] [51 50] [51 67]]
pitch = Pitch(pitch_color='grass', stripe = True, line_color='black', goal_type='box',
constrained_layout=True, tight_layout=False)
fig, ax = pitch.draw()
plt.scatter(events_hull_Toni.location_x, events_hull_Toni.location_y, color='white')
for i in convex_hull_Toni.simplices:
plt.plot(points_hull[i, 0], points_hull[i, 1], 'black')
plt.fill(points_hull[convex_hull_Toni.vertices, 0], points_hull[convex_hull_Toni.vertices, 1],
c='grey', alpha=0.1)
plt.title("Convex Hull for Toni Kroos's field coverage against Liverpool")
Text(0.5, 1.0, "Convex Hull for Toni Kroos's field coverage against Liverpool")
So, we have been able to compute and visualize the convex hulls for players from a particular game. Next, we will try to understand how to get tracking data from a particular game using statsbomb
api. We need tracking data to compute Delaunay triangulations and Voronoi diagrams.
The match id that we have been working with is 18245
.
We need to first import useful classes from the mplsoccer.statsbomb
module:
from mplsoccer.statsbomb import read_event, EVENT_SLUG
event_json = read_event(f'{EVENT_SLUG}/18245.json', related_event_df = False,
tactics_lineup_df = False, warn = False)
event = event_json['event']
tracking = event_json['shot_freeze_frame']
event
and tracking
datasets:event.head(5)
match_id | id | index | period | timestamp_minute | timestamp_second | timestamp_millisecond | minute | second | type_id | ... | injury_stoppage_in_chain | shot_statsbomb_xg | shot_key_pass_id | shot_first_time | shot_one_on_one | shot_redirect | substitution_replacement_id | substitution_replacement_name | tactics_formation | aerial_won | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 18245 | 5eee3ffd-f0c0-4532-868b-4a66cbf20cb8 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 35 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 41212.0 | NaN |
1 | 18245 | eaa65a92-02d3-4375-b2b7-7c2f679a620c | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 35 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 433.0 | NaN |
2 | 18245 | 9c82d2e5-ebba-4825-b7f9-b11b04433ed8 | 3 | 1 | 0 | 0 | 0 | 0 | 0 | 18 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | 18245 | b791047a-3eea-452f-b3a9-212bd40cd7cb | 4 | 1 | 0 | 0 | 0 | 0 | 0 | 18 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | 18245 | 25be91a5-a084-42cb-8cc1-a0fe7b0f52f9 | 5 | 1 | 0 | 0 | 371 | 0 | 0 | 30 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 77 columns
event.tail(5)
match_id | id | index | period | timestamp_minute | timestamp_second | timestamp_millisecond | minute | second | type_id | ... | injury_stoppage_in_chain | shot_statsbomb_xg | shot_key_pass_id | shot_first_time | shot_one_on_one | shot_redirect | substitution_replacement_id | substitution_replacement_name | tactics_formation | aerial_won | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3492 | 18245 | b4258521-d4ec-466d-a90c-e4522692a45b | 3493 | 2 | 47 | 30 | 959 | 92 | 30 | 30 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3493 | 18245 | 37f51448-ebd1-4d67-8d9e-fa4b450111b2 | 3494 | 2 | 47 | 33 | 52 | 92 | 33 | 42 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3494 | 18245 | e9f7bb50-f4fc-45aa-87d3-20bbe9ebd32f | 3495 | 2 | 47 | 39 | 157 | 92 | 39 | 40 | ... | True | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3495 | 18245 | ce7d446a-e8bf-4631-bcf5-2bd323ba251e | 3496 | 2 | 48 | 2 | 893 | 93 | 2 | 34 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3496 | 18245 | d19b2348-de55-4bbf-9b1f-e44d95aa3a77 | 3497 | 2 | 48 | 2 | 893 | 93 | 2 | 34 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 77 columns
tracking.head(5)
id | event_freeze_id | player_teammate | player_id | player_name | player_position_id | player_position_name | x | y | match_id | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 682270cc-4bc4-4952-8f91-d3c5a704a691 | 1 | False | 5463 | Luka Modrić | 13 | Right Center Midfield | 98.0 | 48.4 | 18245 |
1 | 9f5aa3eb-3bed-4bc0-97a5-bb8444b235b9 | 1 | True | 3535 | Roberto Firmino Barbosa de Oliveira | 23 | Center Forward | 109.0 | 39.9 | 18245 |
2 | 399ac143-5f7b-4080-8c0b-3c18435d7fc1 | 1 | True | 3655 | Andrew Robertson | 6 | Left Back | 102.1 | 2.5 | 18245 |
3 | 660d9d98-46b6-4b5e-9c9a-435d63142c93 | 1 | True | 4926 | Francisco Román Alarcón Suárez | 19 | Center Attacking Midfield | 100.2 | 11.0 | 18245 |
4 | fe6c7f60-2ff0-4077-882e-b045c8abc7c3 | 1 | True | 3629 | Sadio Mané | 21 | Left Wing | 90.9 | 32.3 | 18245 |
tracking.tail(5)
id | event_freeze_id | player_teammate | player_id | player_name | player_position_id | player_position_name | x | y | match_id | |
---|---|---|---|---|---|---|---|---|---|---|
356 | 18f64bd1-c8a9-4f31-9e58-3ec7a1de0a80 | 16 | False | 5463 | Luka Modrić | 13 | Right Center Midfield | 99.9 | 19.0 | 18245 |
357 | 9f5aa3eb-3bed-4bc0-97a5-bb8444b235b9 | 17 | False | 5463 | Luka Modrić | 13 | Right Center Midfield | 99.2 | 50.3 | 18245 |
358 | 18f64bd1-c8a9-4f31-9e58-3ec7a1de0a80 | 17 | False | 5201 | Sergio Ramos García | 5 | Left Center Back | 114.1 | 42.9 | 18245 |
359 | 9f5aa3eb-3bed-4bc0-97a5-bb8444b235b9 | 18 | False | 5574 | Toni Kroos | 15 | Left Center Midfield | 102.7 | 37.0 | 18245 |
360 | 18f64bd1-c8a9-4f31-9e58-3ec7a1de0a80 | 18 | False | 5485 | Raphaël Varane | 3 | Right Center Back | 114.4 | 37.3 | 18245 |
event
and tracking
, we understand that, the former represents the event data and the later represents the tracking data. Let us look into the columns of the tracking
dataset:print(tracking.columns)
Index(['id', 'event_freeze_id', 'player_teammate', 'player_id', 'player_name', 'player_position_id', 'player_position_name', 'x', 'y', 'match_id'], dtype='object')
tracking
dataset, we understand that the column id
represents an unique id for a shot freeze frame, i.e, it gives the unique id for the moment when a particular player was taking a shot along with the information about locations of the other players. Looking at the player_name
column, we need to add a column team
to the tracking
dataset, giving us information about which team the shot taker belongs to.tracking['team'] = 0
for i in range(len(tracking)):
if tracking['player_name'][i] in players_Real:
tracking['team'][i] = 'Real Madrid'
else:
tracking['team'][i] = 'Liverpool'
tracking.head(5)
id | event_freeze_id | player_teammate | player_id | player_name | player_position_id | player_position_name | x | y | match_id | team | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 682270cc-4bc4-4952-8f91-d3c5a704a691 | 1 | False | 5463 | Luka Modrić | 13 | Right Center Midfield | 98.0 | 48.4 | 18245 | Real Madrid |
1 | 9f5aa3eb-3bed-4bc0-97a5-bb8444b235b9 | 1 | True | 3535 | Roberto Firmino Barbosa de Oliveira | 23 | Center Forward | 109.0 | 39.9 | 18245 | Liverpool |
2 | 399ac143-5f7b-4080-8c0b-3c18435d7fc1 | 1 | True | 3655 | Andrew Robertson | 6 | Left Back | 102.1 | 2.5 | 18245 | Liverpool |
3 | 660d9d98-46b6-4b5e-9c9a-435d63142c93 | 1 | True | 4926 | Francisco Román Alarcón Suárez | 19 | Center Attacking Midfield | 100.2 | 11.0 | 18245 | Real Madrid |
4 | fe6c7f60-2ff0-4077-882e-b045c8abc7c3 | 1 | True | 3629 | Sadio Mané | 21 | Left Wing | 90.9 | 32.3 | 18245 | Liverpool |
tracking = tracking[['id', 'player_name', 'x', 'y', 'team']]
tracking.head(5)
id | player_name | x | y | team | |
---|---|---|---|---|---|
0 | 682270cc-4bc4-4952-8f91-d3c5a704a691 | Luka Modrić | 98.0 | 48.4 | Real Madrid |
1 | 9f5aa3eb-3bed-4bc0-97a5-bb8444b235b9 | Roberto Firmino Barbosa de Oliveira | 109.0 | 39.9 | Liverpool |
2 | 399ac143-5f7b-4080-8c0b-3c18435d7fc1 | Andrew Robertson | 102.1 | 2.5 | Liverpool |
3 | 660d9d98-46b6-4b5e-9c9a-435d63142c93 | Francisco Román Alarcón Suárez | 100.2 | 11.0 | Real Madrid |
4 | fe6c7f60-2ff0-4077-882e-b045c8abc7c3 | Sadio Mané | 90.9 | 32.3 | Liverpool |
player_info = sb.lineups(match_id = 18245)
credentials were not supplied. open data access only
player_info
has information about both the teams. Let us fetch for Real Madrid
first:info_Real = player_info['Real Madrid']
info_Real
player_id | player_name | player_nickname | jersey_number | country | |
---|---|---|---|---|---|
0 | 4926 | Francisco Román Alarcón Suárez | Isco | 22 | Spain |
1 | 5200 | Lucas Vázquez Iglesias | Lucas Vázquez | 17 | Spain |
2 | 5201 | Sergio Ramos García | Sergio Ramos | 4 | Spain |
3 | 5202 | José Ignacio Fernández Iglesias | Nacho | 6 | Spain |
4 | 5207 | Cristiano Ronaldo dos Santos Aveiro | Cristiano Ronaldo | 7 | Portugal |
5 | 5456 | Mateo Kovačić | None | 23 | Croatia |
6 | 5463 | Luka Modrić | None | 10 | Croatia |
7 | 5485 | Raphaël Varane | None | 5 | France |
8 | 5539 | Carlos Henrique Casimiro | Casemiro | 14 | Brazil |
9 | 5552 | Marcelo Vieira da Silva Júnior | Marcelo | 12 | Brazil |
10 | 5574 | Toni Kroos | None | 8 | Germany |
11 | 5597 | Keylor Navas Gamboa | Keylor Navas | 1 | Costa Rica |
12 | 5719 | Marco Asensio Willemsen | Marco Asensio | 20 | Spain |
13 | 5721 | Daniel Carvajal Ramos | Daniel Carvajal | 2 | Spain |
14 | 6399 | Gareth Frank Bale | Gareth Bale | 11 | Wales |
15 | 6704 | Theo Bernard François Hernández | Theo Hernández | 15 | France |
16 | 6706 | Francisco Casilla Cortés | Kiko Casilla | 13 | Spain |
17 | 19677 | Karim Benzema | None | 9 | France |
player_name
and jersey_number
columns and build a dictionary:info_Real = info_Real[['player_name', 'jersey_number']]
jerseys_Real = {}
for i in range(len(info_Real)):
jerseys_Real[info_Real.player_name[i]] = str(info_Real.jersey_number[i])
print(jerseys_Real)
{'Francisco Román Alarcón Suárez': '22', 'Lucas Vázquez Iglesias': '17', 'Sergio Ramos García': '4', 'José Ignacio Fernández Iglesias': '6', 'Cristiano Ronaldo dos Santos Aveiro': '7', 'Mateo Kovačić': '23', 'Luka Modrić': '10', 'Raphaël Varane': '5', 'Carlos Henrique Casimiro': '14', 'Marcelo Vieira da Silva Júnior': '12', 'Toni Kroos': '8', 'Keylor Navas Gamboa': '1', 'Marco Asensio Willemsen': '20', 'Daniel Carvajal Ramos': '2', 'Gareth Frank Bale': '11', 'Theo Bernard François Hernández': '15', 'Francisco Casilla Cortés': '13', 'Karim Benzema': '9'}
Liverpool
:info_Liv = player_info['Liverpool']
info_Liv = info_Liv[['player_name', 'jersey_number']]
jerseys_Liv = {}
for i in range(len(info_Liv)):
jerseys_Liv[info_Liv.player_name[i]] = str(info_Liv.jersey_number[i])
print(jerseys_Liv)
{'Dejan Lovren': '6', 'James Philip Milner': '7', 'Emre Can': '23', 'Alberto Moreno Pérez': '18', 'Mohamed Salah': '11', 'Jordan Brian Henderson': '14', 'Roberto Firmino Barbosa de Oliveira': '9', 'Simon Mignolet': '22', 'Georginio Wijnaldum': '5', 'Dominic Solanke': '29', 'Sadio Mané': '19', 'Loris Karius': '1', 'Andrew Robertson': '26', 'Trent Alexander-Arnold': '66', 'Virgil van Dijk': '4', 'Adam David Lallana': '20', 'Ragnar Klavan': '17', 'Nathaniel Edwin Clyne': '2'}
id
from the tracking
dataset, representing an instance when a particular shot was taken. We will filter tracking
by a id
value which will give us the information of the locations of the players on the pitch at that moment. We can view the unique id
values:tracking.id.unique()
array(['682270cc-4bc4-4952-8f91-d3c5a704a691', '9f5aa3eb-3bed-4bc0-97a5-bb8444b235b9', '399ac143-5f7b-4080-8c0b-3c18435d7fc1', '660d9d98-46b6-4b5e-9c9a-435d63142c93', 'fe6c7f60-2ff0-4077-882e-b045c8abc7c3', 'eda7e108-2479-46f2-9cd0-a0bc2939e352', 'c36dfe04-2f8e-48f0-8df6-1c4d0b93a16e', '3e93f456-9971-4a33-9b10-ee9961410a32', '9def9ed2-52f0-496b-8ae8-f4c5a97c2d8a', '20b934f1-9afa-401d-9a16-f97fea2b80d9', '6711367a-6855-4914-903e-a5e19771429c', 'e8c20962-0eef-4066-97ce-dcaad4f70b52', '02f0755f-76cf-4d30-8062-369dc9509bdd', '6cb4171b-90e6-4473-831e-df7a2da29f28', '93c40040-ab9a-4549-8f0e-46c5c1c8e9cd', '142e18c8-316a-4f9f-a0f8-3c41549ad1c3', '6f994944-70fc-4a30-acca-315e3fede0bb', '7654fe57-734f-45d8-bc83-ab940cd37c45', '30a872eb-fe88-4c46-858b-a4f487cb69e4', '53b73ee0-8c9c-4b64-83c5-69fc453376a1', '804f8c8e-d714-4e6a-9cd1-599665efb8c8', '36687201-f131-4418-9dd0-f632bc9c4257', '650a2dc2-e5bb-4fac-9259-afbc03bdc322', '312f9c86-6a3c-42b1-bdeb-f92cb1b16a48', '222c90b6-8293-409a-ac6d-e2c3c2e69948', 'c7f3935c-23fa-4ddc-a6ee-eb9d0972d034', '05688a6e-37f8-4aa6-a36e-d8151aa75997', '18f64bd1-c8a9-4f31-9e58-3ec7a1de0a80'], dtype=object)
shot_id = '3e93f456-9971-4a33-9b10-ee9961410a32' # select a particular value from the id column
tracking_filtered = tracking[tracking['id'] == shot_id] # filter by the shot_id
event_filtered = event[event['id'] == shot_id]
event_filtered = event_filtered[['id', 'player_name', 'x', 'y', 'team_name']]
event_filtered = event_filtered.rename(columns = {'team_name':'team'})
data_filtered = pd.concat([event_filtered, tracking_filtered])
data_filtered
dataset looks like this:data_filtered
id | player_name | x | y | team | |
---|---|---|---|---|---|
747 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Cristiano Ronaldo dos Santos Aveiro | 111.7 | 58.7 | Real Madrid |
7 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Loris Karius | 118.1 | 45.0 | Liverpool |
35 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Roberto Firmino Barbosa de Oliveira | 100.8 | 49.0 | Liverpool |
63 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Daniel Carvajal Ramos | 100.9 | 50.2 | Real Madrid |
91 | 3e93f456-9971-4a33-9b10-ee9961410a32 | James Philip Milner | 91.3 | 28.4 | Liverpool |
119 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Karim Benzema | 108.9 | 37.9 | Real Madrid |
147 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Georginio Wijnaldum | 105.7 | 56.5 | Liverpool |
175 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Jordan Brian Henderson | 108.0 | 50.0 | Liverpool |
202 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Virgil van Dijk | 111.7 | 54.7 | Liverpool |
228 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Trent Alexander-Arnold | 105.2 | 35.3 | Liverpool |
254 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Dejan Lovren | 111.8 | 41.1 | Liverpool |
280 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Toni Kroos | 91.0 | 30.3 | Real Madrid |
304 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Francisco Román Alarcón Suárez | 102.4 | 40.6 | Real Madrid |
X
consisting of points on a 2-D Euclidean surface, a Delaunay triangulation is a type of geometric triangulation such that no points in X
lies inside the circum-circle of any triangle in the triangulation. A representation of the Delaunay triangle from the same wikipedia article:
Delaunay
from scipy.spatial
to compute the triangulation:from scipy.spatial import Delaunay
data_filtered
for the teams:tracking_Real = data_filtered[data_filtered['team'] == 'Real Madrid'].reset_index()
tracking_Liv = data_filtered[data_filtered['team'] == 'Liverpool'].reset_index()
tracking_Real
index | id | player_name | x | y | team | |
---|---|---|---|---|---|---|
0 | 747 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Cristiano Ronaldo dos Santos Aveiro | 111.7 | 58.7 | Real Madrid |
1 | 63 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Daniel Carvajal Ramos | 100.9 | 50.2 | Real Madrid |
2 | 119 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Karim Benzema | 108.9 | 37.9 | Real Madrid |
3 | 280 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Toni Kroos | 91.0 | 30.3 | Real Madrid |
4 | 304 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Francisco Román Alarcón Suárez | 102.4 | 40.6 | Real Madrid |
tracking_Liv
index | id | player_name | x | y | team | |
---|---|---|---|---|---|---|
0 | 7 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Loris Karius | 118.1 | 45.0 | Liverpool |
1 | 35 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Roberto Firmino Barbosa de Oliveira | 100.8 | 49.0 | Liverpool |
2 | 91 | 3e93f456-9971-4a33-9b10-ee9961410a32 | James Philip Milner | 91.3 | 28.4 | Liverpool |
3 | 147 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Georginio Wijnaldum | 105.7 | 56.5 | Liverpool |
4 | 175 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Jordan Brian Henderson | 108.0 | 50.0 | Liverpool |
5 | 202 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Virgil van Dijk | 111.7 | 54.7 | Liverpool |
6 | 228 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Trent Alexander-Arnold | 105.2 | 35.3 | Liverpool |
7 | 254 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Dejan Lovren | 111.8 | 41.1 | Liverpool |
points_Real = tracking_Real[['x', 'y']].values
print(points_Real)
[[111.7 58.7] [100.9 50.2] [108.9 37.9] [ 91. 30.3] [102.4 40.6]]
del_Real = Delaunay(tracking_Real[['x', 'y']])
loc_Real = tracking_Real[['player_name','x', 'y']].reset_index()
loc_Liv = tracking_Liv[['player_name','x', 'y']].reset_index()
loc_Real
index | player_name | x | y | |
---|---|---|---|---|
0 | 0 | Cristiano Ronaldo dos Santos Aveiro | 111.7 | 58.7 |
1 | 1 | Daniel Carvajal Ramos | 100.9 | 50.2 |
2 | 2 | Karim Benzema | 108.9 | 37.9 |
3 | 3 | Toni Kroos | 91.0 | 30.3 |
4 | 4 | Francisco Román Alarcón Suárez | 102.4 | 40.6 |
loc_Liv
index | player_name | x | y | |
---|---|---|---|---|
0 | 0 | Loris Karius | 118.1 | 45.0 |
1 | 1 | Roberto Firmino Barbosa de Oliveira | 100.8 | 49.0 |
2 | 2 | James Philip Milner | 91.3 | 28.4 |
3 | 3 | Georginio Wijnaldum | 105.7 | 56.5 |
4 | 4 | Jordan Brian Henderson | 108.0 | 50.0 |
5 | 5 | Virgil van Dijk | 111.7 | 54.7 |
6 | 6 | Trent Alexander-Arnold | 105.2 | 35.3 |
7 | 7 | Dejan Lovren | 111.8 | 41.1 |
pitch = Pitch(pitch_color='grass', stripe=True, line_color='white', view = 'half', figsize=(8, 9),
constrained_layout=True, tight_layout=False, goal_type='box')
fig, ax = pitch.draw()
plt.scatter(tracking_Real.x, tracking_Real.y, color='white', s = 400, edgecolors='black', zorder=2)
plt.scatter(tracking_Liv.x, tracking_Liv.y, color='red', edgecolors='black', s = 400)
plt.triplot(points_Real[:, 0], points_Real[:, 1], del_Real.simplices.copy(), 'k-', lw = 4)
for index, row in loc_Real.iterrows():
pitch.annotate(jerseys_Real[loc_Real['player_name'][row.name]], xy=(row.x, row.y), c ='black',
va = 'center', ha = 'center', size = 14, ax = ax)
for index, row in loc_Liv.iterrows():
pitch.annotate(jerseys_Liv[loc_Liv['player_name'][row.name]], xy=(row.x, row.y), c ='black',
va = 'center', ha = 'center', size = 14, ax = ax)
Liverpool
's players and the white nodes indicate that of Real Madrid
's. The black lines indicate the direct links between the players from a particular team at a particular moment, forming the Delaunay triangulations, also called the pass triangulations. In his book Soccematics, Dr. Sumpter mentions that these lines have two useful indications: first, they portray the availability of passes among the players from a particular team, and second, they also indicate the "no man's lines" for the players from the opposition team, meaning, if an opposition player is on one of these linking lines, then they are at a disadvantage. Beautiful implementation of computational geometry, isn't it?X
of points, denote the partitions of a 2-D Euclidean space into regions that are close to each of these points. X
. Look at the image of a Voronoi diagram (taken from here), which is the dual of the Delaunay triangulation that is shown above.
data_filtered
dataset, because we need the location of all the players on the pitch. Voronoi
for computing the Voronoi diagrams and voronoi_plot_2d
to plot the diagrams on a pitch.from scipy.spatial import Voronoi, voronoi_plot_2d
data_filtered
and compute the Voronoi diagrams:data_filtered['y'] = 80 - data_filtered['y']
points = data_filtered[['x', 'y']].values
vor = Voronoi(points)
pitch = Pitch(pitch_color='grass', stripe=True, line_color='white', view = 'half', figsize=(8,9),
constrained_layout=True, tight_layout=False, goal_type='box')
fig, ax = pitch.draw()
plt.scatter(tracking_Real.x, 80 - tracking_Real.y, color='white', s = 1050, edgecolors='black', zorder=2)
plt.scatter(tracking_Liv.x, 80 -tracking_Liv.y, color='red', edgecolors='black', s = 1050)
pl = voronoi_plot_2d(vor, ax=ax, show_vertices=False, line_width = 8)
for index, row in loc_Real.iterrows():
pitch.annotate(jerseys_Real[loc_Real['player_name'][row.name]], xy=(row.x, 80 - row.y),
c ='black', va = 'center', ha = 'center', size = 15, ax = ax)
for index, row in loc_Liv.iterrows():
pitch.annotate(jerseys_Liv[loc_Liv['player_name'][row.name]], xy=(row.x, 80 - row.y),
c ='black', va = 'center', ha = 'center', size = 15, ax = ax)